Forum Moderators: DixonJones
Did your logs turn anything up? There are plenty of site mirroring programs out there, and they may or may not respect your robots.txt file. First thing I'd do is block your images directory from all robots, or just selectively if you find one particular problem program in your logs.
If the software doesn't obey robots.txt, you might be able to block by user agent [webmasterworld.com] using .htaccess
If that still doesn't help, you'll probably have to look into banning the IP address of persistant offenders.
1) most spidering/copying programs provide the ability to set the user agent. easy to disguise yourself as googlebot for example (although the ip address would be wrong of course) so filtering on user agent alone wouldn't work.
2) use of an anonymous proxy would hide the ip address of a transgressor. so you couldn't filter on ip address
3) even if you detected an ip address that was downloading a lot of data the culprit could still cycle through a list of anonymous proxies
The only way i can think of to prevent this is to force your users to register before downloading and ensure that the login page requires human attendance to be successful using some sort of dynamically create PIN (Alta Vista use this for url submission)
Just an idea i had once!
ps also you could hide a link somewhere, exclude the target url using robots.txt, then ban any ip that tries to retrieve it, or maybe redirect it to some unsavoury website!
[edited by: incywincy at 3:24 pm (utc) on Nov. 5, 2002]
The most useful recent addition to my site's armament is a small script which automatically adds a "ban" to my .htaccess file. It does this whenever a bot attempts to fetch a page which is Disallowed in robots.txt
A few "trap" links are scattered within the high-traffic pages of the site. These invisible links lead to pages which are disallowed by robots.txt. If a 'bot requests one of these pages, the script is invoked. The 'bots IP address is then added to .htaccess, and further requests receive a 403-Forbidden response.
This script [webmasterworld.com] was originally posted here on this site by Key_Master, and another member and I have tweaked it a little, adding file-locking to avoid problems if the script is invoked from two or more requests at the same time. It works great, and I recommend it to anyone who is tired of manually adding unwelcome visitors' IP addresses to their ban list.
Jim
We're moving from verio to myacen, so all activities are frozen til we are up again... so no webtrends til that...
BTW, thank you all for your posts. Don't know yet what's going on tho...
Right now, (today's 5) we already have 5.9 GB(data transfer-wise)...
Verio is too expensive...
And... we're DOWN.
(will keep u updated)
Thanks again! and...
Thanks in advance!