Forum Moderators: goodroi
Just a few days ago, I noticed 16Mb being downloaded in about 20 minutes, the user agent was RPT-HTTPClient/0.3-3
There wasn't much information about this agent, but I did find something at:
snipped
which mentioned the behaviour of the agent to be 'naughty'. Does anyone know what this means. Is the sipder more of a web downloader or web grabber, and should be banned anyway?
If I ban the agent in robots.txt, there is no guarantee that the agent will follow the rules, is there? That is, I cannot force exclusion that way, but maybe in .htaccess?
Also, another "strange" agent, the web logs as follows:
66.147.154.3 - - [08/Mar/2004:22:06:57 -0500] "GET /robots.txt HTTP/1.0" 200 54 "-" "http://www.almaden.ibm.com/cs/crawler [c01]"
66.147.154.3 - - [08/Mar/2004:22:07:04 -0500] "GET /index.html HTTP/1.0" 404 - "-" "http://www.almaden.ibm.com/cs/crawler [c01]"
66.147.154.3 - - [08/Mar/2004:22:07:15 -0500] "GET /_cmdlogin?login=guest&version=enterprise HTTP/1.0" 404 - "-" "http://www.almaden.ibm.com/cs/crawler [c01]"
66.147.154.3 - - [08/Mar/2004:22:07:26 -0500] "GET /se/ HTTP/1.0" 404 - "-" "http://www.almaden.ibm.com/cs/crawler [c01]"
I did do some searching on this site, and it appears the above IP/site was indicated as something that should be banned. I can use .htaccess to ban IP addresses, but it would make more sense to ban the agent's I do not approve of, wouldn't it?
I can put an array of banned agents in a PHP file also, that is always executed from every page, but that may place a bit more load on the server, and possibly affect response time, I don't know.
Is there a "definitive" list of user agents that are banned please?
Thanks,
Peter
[edited by: DaveAtIFG at 4:12 am (utc) on Mar. 16, 2004]
[edit reason] Removed URL [/edit]
Btw, aren't we allowed to post URL's that are not 'webmasterworld' ones?I wish you hadn't asked that! :) We're going way off topic and the answer isn't simple but...
First, review the TOS [webmasterworld.com], items 13, 20, and 25. Here are a few threads that discuss the issue.
[webmasterworld.com...]
[webmasterworld.com...]
Unfortunately there are no "hard and fast rules" that apply every time... except "Don't post URLs" and that's unrealistic. It basically comes down to a judgment call for each case.