Welcome to WebmasterWorld Guest from 220.127.116.11 , register , free tools , login , search , pro membership , help , library , announcements , recent posts , open posts Become a Pro Member
Yahoo crawler(s) requesting mangled URLs Jonesy msg:4317695 11:36 pm on May 25, 2011 (gmt 0) I have not seen this exact 'weirdness' discussed here before... For some time now I've been seeing wildly bogus URLs in the crawling of b3090809.crawl.yahoo.net and b3090944.crawl.yahoo.net (and maybe others...) (67.195.115.___). I'm seeing this across several different domains (all mingled below as /BASE_URL/). Most, if not all, of the file name requests are badly mangled text strings of existing html files. /home/ftp/BASE_URL/public_html/iaciovties html "activities.html" but,exists in lower dir /home/ftp/BASE_URL/public_html/PuebloMasters/Du "/Dues.html" /home/ftp/BASE_URL/public_html/W3DHJ/><img 4eght:= ? total crap /home/ftp/BASE_URL/public_html/UBSC/inde_x.html" "index.html" of course /home/ftp/BASE_URL/public_html/irted html ? /home/ftp/BASE_URL/public_html/iooms,html "rooms.html" but, exists in lower dir /home/ftp/BASE_URL/public_html/imaphtml ? /home/ftp/BASE_URL/public_html/PuebloMasters/Dues.htmk "Dues.html" /home/ftp/BASE_URL/public_html/UBSC/fubsc_mmorial "ubsc_memorial.html" /home/ftp/BASE_URL/public_html/UBSC/inde_x.html "index.html" of course /home/ftp/BASE_URL/public_html/itanvelhtml "travel.html" but, exists in lower dir /home/ftp/BASE_URL/public_html/UBSC/fid-exhtml ? index.html ? /home/ftp/BASE_URL/public_html/index.html The FerryHom Pg s ? appended, bogus, blank-delimted garbage Additionally, I see the same crawler(s) requesting valid html filenames in the doc root that only exist and have ONLY EVER existed in lower directories -- and getting 404's as a result. Anybody else seeing this? Jonesy
incrediBILL msg:4319102 8:54 pm on May 28, 2011 (gmt 0)
Just a guess, but I've seen similar stuff before and it's typically a bad scraper page that links back to your site. The SEs will crawl all the malformed links and they just 404 to your site. However, I often wonder if feeding bad 404s to a site isn't some black hat attempt to make your site look bad in the eyes of the SE, just a though.