Welcome to WebmasterWorld Guest from 54.197.171.28

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Yahoo crawler(s) requesting mangled URLs

   
11:36 pm on May 25, 2011 (gmt 0)

5+ Year Member



I have not seen this exact 'weirdness' discussed here before...

For some time now I've been seeing wildly bogus URLs in the crawling
of b3090809.crawl.yahoo.net and b3090944.crawl.yahoo.net
(and maybe others...) (67.195.115.___).
I'm seeing this across several different domains (all mingled below as /BASE_URL/).
Most, if not all, of the file name requests are badly mangled text
strings of existing html files.

/home/ftp/BASE_URL/public_html/iaciovties html
"activities.html"
but,exists in lower dir

/home/ftp/BASE_URL/public_html/PuebloMasters/Du
"/Dues.html"

/home/ftp/BASE_URL/public_html/W3DHJ/><img 4eght:=
? total crap

/home/ftp/BASE_URL/public_html/UBSC/inde_x.html"
"index.html" of course

/home/ftp/BASE_URL/public_html/irted html
?

/home/ftp/BASE_URL/public_html/iooms,html
"rooms.html"
but, exists in lower dir

/home/ftp/BASE_URL/public_html/imaphtml
?

/home/ftp/BASE_URL/public_html/PuebloMasters/Dues.htmk
"Dues.html"

/home/ftp/BASE_URL/public_html/UBSC/fubsc_mmorial
"ubsc_memorial.html"

/home/ftp/BASE_URL/public_html/UBSC/inde_x.html
"index.html" of course

/home/ftp/BASE_URL/public_html/itanvelhtml
"travel.html"
but, exists in lower dir

/home/ftp/BASE_URL/public_html/UBSC/fid-exhtml
? index.html ?

/home/ftp/BASE_URL/public_html/index.html The FerryHom Pg s
? appended, bogus,
blank-delimted garbage

Additionally, I see the same crawler(s) requesting valid html filenames
in the doc root that only exist and have ONLY EVER existed in lower
directories -- and getting 404's as a result.

Anybody else seeing this?
Jonesy
8:54 pm on May 28, 2011 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Just a guess, but I've seen similar stuff before and it's typically a bad scraper page that links back to your site. The SEs will crawl all the malformed links and they just 404 to your site.

However, I often wonder if feeding bad 404s to a site isn't some black hat attempt to make your site look bad in the eyes of the SE, just a though.