Welcome to WebmasterWorld Guest from

Forum Moderators: mademetop

Message Too Old, No Replies

Design a site that can be expected to attract a substantial number of rude robots.

4:57 pm on Apr 14, 2001 (gmt 0)

New User

5+ Year Member

joined:Sept 12, 2012
votes: 0

This opens up the question of how to design a site that can be expected to attract a substantial number of rude robots. This is a real problem for a very specialized sort of site that has thousands of files in its archives, all linked and publicly available and searchable. An example of such a site, with over 6,000 files, is [cryptome.org...]

John Young, the administrator of Cryptome, has a constant problem with rude robots. As many as 100 at a time have been blocked by him. But the real problem is that all of his files are static, and he's only able to block them manually. If he were to redesign his site for on-the-fly blocking, how would he go about this? Without on-the-fly, automated blocking, his server gets overloaded before he's able to take action. Moreover, he runs the site as a hobby, and is an architect, not a programmer. He shouldn't have to worry about rude robots at all.

Most of the participants in this forum work for commercial firms that are trying to generate MORE search engine activity. But if you are designing a site for a nonprofit that, for example, wants to digitize and place online about 25 years worth of dead-tree articles from its specialized publication, you can pretty well plan on having rude robot problems. This sort of archive is something to "kill for" if you share this specialized interest; there's no question that many rude surfers will find the tool they need to "click for" it as well, and just plow through the entire site with their broadband connection.

The entire site has to be designed from scratch to provide maximum flexibility, and this generally means that all the pages have to be generated dynamically.

You need a compiled "C" program for speed, memory efficiency, and lowest CPU load. You have to be faster than the fastest rude robot for effective blocking, and transparent to the bots you like, and you might be dealing with a mixture of all these simultaneously. This is no place for server-side Java or Perl scripts.

Beyond that, I'd be interested in hearing from anyone who has given this problem further thought. It seems to me that a specialized Apache module or entire httpd package that is optimized for this is something that might already have a market. And this market will get bigger, because the rude robot business is getting bigger. The robots.txt standard presumes unflinching courtesy from everyone. Those days are long gone.

5:01 am on Apr 17, 2001 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 21, 1999
votes: 0

Welcome to WebmasterWorld Doofus! There's a thread developing at [webmasterworld.com...] on this very topic. Some background on that new thread is at [webmasterworld.com...]

You've brought up some interesting points, perhaps you and Everyman can brainstorm the perfect "load thermostat!"


Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members