Page is a not externally linkable
- Search Engines
-- Sitemaps, Meta Data, and robots.txt
---- Hiding robots.txt


dirkz - 11:51 am on Oct 29, 2003 (gmt 0)


Does this all only work on Linux/Unix hosting?

It's meant for apache. There's also a win32 version of it, but I don't know whether it's suitable for production servers.

Some sites fall prey to constant file pilfering, leeching and unwanted mass downloads

I have experienced both sides: The bot programmer and the site owner. A "good" leeching bot (in the eye of the leecher) will disguise its UA and never obeye a robots.txt. It's quite easy to modify existing bots in Perl and Python to do so. It's also very easy to write your own.

On the other side of the fence, as a site owner I strongly recommend "traffic-shaping methods" in real time independent of UA and robots stuff based on "offending" IPs. It works like firewalls detecting intrusion attempts and DOS attacks, only on a higher level (HTTP).

Btw, from my experience a lot of leechers use sophisticated Perl/Python solutions. Sometimes I feel like telling them about wget and its mirroring options. Leecher's life could be so simple :)


Thread source:: http://www.webmasterworld.com/robots_txt/74.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com