Forum Moderators: DixonJones

Message Too Old, No Replies

404's associated with filename misspellings.

Why are "bots" requesting these files?

         

Broadway

4:14 pm on Aug 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I was noticing that a large percentage my site's misspelled and nonsensical filename errors are associated with "bots" (msn, slurp). The bot shows up out of the blue requesting a misspelled filename. It is served my 404 page and then just leaves. No further pages (or associated images or css) are requested

Common errors are:
1) Replacing the # mark with "%23" (filename.htm#inpage.link to filename.htm%23inpage.link).
2) Tacking on extra characters to the end of the file name "filename.htm:" or "filename.htm)"

The referrer is typically "-" so I can't really evaluate the source.

It doesn't make sense that the misspelled file is already in the bot's index and they are just coming back to reindex and update because these filenames never existed.

Is this activity evidence that the bot is following a misspelled link found on some other site? (As in this bad filename actually does exist in a link somewhere and needs to be 301 redirected for the visitor and to capture link juice.) Or is there another explanation why a bot would be looking for a misspelled filename?

micklearn

5:09 am on Aug 24, 2010 (gmt 0)

10+ Year Member



Broadway, I have wondered about this issue for a long time, since I have seen many log errors similar to what you described over the last few years. Yet, I have not been able to determine where the inbound links are coming from, or if they even exist.

<tinfoilhaton>Are the major SE's/bots using this as a method to test/classify sites in some way, over coveted link juice?...And then later the algorithm determines that: "Site 'abc' fell for it and 301'd the fake link, while site 'xyz' did not. Therefore, site 'abc' is now on our radar..."</tinfoilhaton> Be honest, if that makes no sense, it's late here.

Broadway

7:02 pm on Aug 25, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If so, it's sad that they would put effort in this (essentially white-hat effort) as opposed to more effort on detecting link buying.

micklearn

4:27 am on Aug 26, 2010 (gmt 0)

10+ Year Member



I hear ya, just speculation on my part. Read a few articles and watched a video a while back about SE bots testing for "soft 404 errors" (which now appear in Google's WMT). After a while, I wound up with the "conclusion" that I mentioned above. Something like, "go ahead, 301 that "link", and see what happens next". As for more effort on link buying detection, heck, one of us should start a new thread about that - and get the media to pick up the story. Can that problem be solved?