homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

Yahoo crawler(s) requesting mangled URLs

5+ Year Member

Msg#: 4319026 posted 11:36 pm on May 25, 2011 (gmt 0)

I have not seen this exact 'weirdness' discussed here before...

For some time now I've been seeing wildly bogus URLs in the crawling
of b3090809.crawl.yahoo.net and b3090944.crawl.yahoo.net
(and maybe others...) (67.195.115.___).
I'm seeing this across several different domains (all mingled below as /BASE_URL/).
Most, if not all, of the file name requests are badly mangled text
strings of existing html files.

/home/ftp/BASE_URL/public_html/iaciovties html
but,exists in lower dir


/home/ftp/BASE_URL/public_html/W3DHJ/><img 4eght:=
? total crap

"index.html" of course

/home/ftp/BASE_URL/public_html/irted html

but, exists in lower dir




"index.html" of course

but, exists in lower dir

? index.html ?

/home/ftp/BASE_URL/public_html/index.html The FerryHom Pg s
? appended, bogus,
blank-delimted garbage

Additionally, I see the same crawler(s) requesting valid html filenames
in the doc root that only exist and have ONLY EVER existed in lower
directories -- and getting 404's as a result.

Anybody else seeing this?



WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

Msg#: 4319026 posted 8:54 pm on May 28, 2011 (gmt 0)

Just a guess, but I've seen similar stuff before and it's typically a bad scraper page that links back to your site. The SEs will crawl all the malformed links and they just 404 to your site.

However, I often wonder if feeding bad 404s to a site isn't some black hat attempt to make your site look bad in the eyes of the SE, just a though.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved