|Should I ban this user agent?|
| 2:04 am on Mar 16, 2004 (gmt 0)|
Just a few days ago, I noticed 16Mb being downloaded in about 20 minutes, the user agent was RPT-HTTPClient/0.3-3
There wasn't much information about this agent, but I did find something at:
which mentioned the behaviour of the agent to be 'naughty'. Does anyone know what this means. Is the sipder more of a web downloader or web grabber, and should be banned anyway?
If I ban the agent in robots.txt, there is no guarantee that the agent will follow the rules, is there? That is, I cannot force exclusion that way, but maybe in .htaccess?
Also, another "strange" agent, the web logs as follows:
|220.127.116.11 - - [08/Mar/2004:22:06:57 -0500] "GET /robots.txt HTTP/1.0" 200 54 "-" "http://www.almaden.ibm.com/cs/crawler [c01]" |
18.104.22.168 - - [08/Mar/2004:22:07:04 -0500] "GET /index.html HTTP/1.0" 404 - "-" "http://www.almaden.ibm.com/cs/crawler [c01]"
22.214.171.124 - - [08/Mar/2004:22:07:15 -0500] "GET /_cmdlogin?login=guest&version=enterprise HTTP/1.0" 404 - "-" "http://www.almaden.ibm.com/cs/crawler [c01]"
126.96.36.199 - - [08/Mar/2004:22:07:26 -0500] "GET /se/ HTTP/1.0" 404 - "-" "http://www.almaden.ibm.com/cs/crawler [c01]"
I did do some searching on this site, and it appears the above IP/site was indicated as something that should be banned. I can use .htaccess to ban IP addresses, but it would make more sense to ban the agent's I do not approve of, wouldn't it?
I can put an array of banned agents in a PHP file also, that is always executed from every page, but that may place a bit more load on the server, and possibly affect response time, I don't know.
Is there a "definitive" list of user agents that are banned please?
[edited by: DaveAtIFG at 4:12 am (utc) on Mar. 16, 2004]
[edit reason] Removed URL [/edit]
| 4:15 am on Mar 16, 2004 (gmt 0)|
The Close to perfect .htaccess ban list [webmasterworld.com] is a pretty comprehensive list of "bad bots."
| 5:27 am on Mar 16, 2004 (gmt 0)|
Thanks for the link to that (longish) thread, loads of information there alright. Btw, aren't we allowed to post URL's that are not 'webmasterworld' ones?
| 5:13 pm on Mar 16, 2004 (gmt 0)|
I wish you hadn't asked that! :) We're going way off topic and the answer isn't simple but...
|Btw, aren't we allowed to post URL's that are not 'webmasterworld' ones? |
First, review the TOS [webmasterworld.com], items 13, 20, and 25. Here are a few threads that discuss the issue.
Unfortunately there are no "hard and fast rules" that apply every time... except "Don't post URLs" and that's unrealistic. It basically comes down to a judgment call for each case.