Forum Moderators: open

Message Too Old, No Replies

Open-I-Search Unveiled

Interview with the BotMaster

         

yodokame

3:29 am on Apr 28, 2007 (gmt 0)

10+ Year Member



Older thread: [webmasterworld.com...]

This robot seems to be driving people nuts, based on previous threads. It hit us, and we went through the various stages: robots.txt, 403. It still came.

And then I hit upon a revolutionary idea, something that nobody has ever done before. I hesitate to publish the technique here, because I may be endangering my rights to patent it (or at least to sell a $4.95 e-book on it). But for the good of webmasters everywhere, here it is:

I contacted them and asked them to stop.

And they did.

And they engaged in a lengthy, detailed e-mail exchange with me in order to figure out what was going wrong.

I still don't know who the heck they are (other than that they appear to be native speakers of American English) or what they are doing. But I know a lot more about their robot. The jist of what I've learned is:

  • They have 10 to 20 servers controlled by a master server that handles scheduling

  • They won't hit you any more frequently than once every 5 to 10 seconds (it depends on the size of your site)

  • Each machine currently will independently ask for robots.txt, once every other day, accounting for the multiple requests per day for this file

  • If you're returning a 403 error to them, they don't (can't) read your robots.txt file, so they will never stop coming (they aren't programmed to interpret repeated 403s as the equivalent of a robots.txt block)

    If you're having trouble with this robot, you should contact them via the form on their Web site. They will respond and request a grep of your logs, along with the time zone of the timestamps in the logs, and they will follow up.

  • incrediBILL

    4:48 pm on Apr 29, 2007 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    That's the problem with how most people implement their security in that your robots.txt file should be freely available to all to view but all other files should display 403 if they're blocked.

    Truth is, most things on the net don't care about robots.txt and don't have a phone number to call.

    You got lucky.