homepage Welcome to WebmasterWorld Guest from 54.204.58.87
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Yahoo crawler(s) requesting mangled URLs
Jonesy




msg:4317695
 11:36 pm on May 25, 2011 (gmt 0)

I have not seen this exact 'weirdness' discussed here before...

For some time now I've been seeing wildly bogus URLs in the crawling
of b3090809.crawl.yahoo.net and b3090944.crawl.yahoo.net
(and maybe others...) (67.195.115.___).
I'm seeing this across several different domains (all mingled below as /BASE_URL/).
Most, if not all, of the file name requests are badly mangled text
strings of existing html files.

/home/ftp/BASE_URL/public_html/iaciovties html
"activities.html"
but,exists in lower dir

/home/ftp/BASE_URL/public_html/PuebloMasters/Du
"/Dues.html"

/home/ftp/BASE_URL/public_html/W3DHJ/><img 4eght:=
? total crap

/home/ftp/BASE_URL/public_html/UBSC/inde_x.html"
"index.html" of course

/home/ftp/BASE_URL/public_html/irted html
?

/home/ftp/BASE_URL/public_html/iooms,html
"rooms.html"
but, exists in lower dir

/home/ftp/BASE_URL/public_html/imaphtml
?

/home/ftp/BASE_URL/public_html/PuebloMasters/Dues.htmk
"Dues.html"

/home/ftp/BASE_URL/public_html/UBSC/fubsc_mmorial
"ubsc_memorial.html"

/home/ftp/BASE_URL/public_html/UBSC/inde_x.html
"index.html" of course

/home/ftp/BASE_URL/public_html/itanvelhtml
"travel.html"
but, exists in lower dir

/home/ftp/BASE_URL/public_html/UBSC/fid-exhtml
? index.html ?

/home/ftp/BASE_URL/public_html/index.html The FerryHom Pg s
? appended, bogus,
blank-delimted garbage

Additionally, I see the same crawler(s) requesting valid html filenames
in the doc root that only exist and have ONLY EVER existed in lower
directories -- and getting 404's as a result.

Anybody else seeing this?
Jonesy

 

incrediBILL




msg:4319102
 8:54 pm on May 28, 2011 (gmt 0)

Just a guess, but I've seen similar stuff before and it's typically a bad scraper page that links back to your site. The SEs will crawl all the malformed links and they just 404 to your site.

However, I often wonder if feeding bad 404s to a site isn't some black hat attempt to make your site look bad in the eyes of the SE, just a though.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved