Welcome to WebmasterWorld Guest from 54.144.243.34

Forum Moderators: goodroi

Should I ban this user agent?

   
2:04 am on Mar 16, 2004 (gmt 0)

10+ Year Member



Hi,

Just a few days ago, I noticed 16Mb being downloaded in about 20 minutes, the user agent was RPT-HTTPClient/0.3-3

There wasn't much information about this agent, but I did find something at:

snipped

which mentioned the behaviour of the agent to be 'naughty'. Does anyone know what this means. Is the sipder more of a web downloader or web grabber, and should be banned anyway?

If I ban the agent in robots.txt, there is no guarantee that the agent will follow the rules, is there? That is, I cannot force exclusion that way, but maybe in .htaccess?

Also, another "strange" agent, the web logs as follows:

66.147.154.3 - - [08/Mar/2004:22:06:57 -0500] "GET /robots.txt HTTP/1.0" 200 54 "-" "http://www.almaden.ibm.com/cs/crawler [c01]"
66.147.154.3 - - [08/Mar/2004:22:07:04 -0500] "GET /index.html HTTP/1.0" 404 - "-" "http://www.almaden.ibm.com/cs/crawler [c01]"
66.147.154.3 - - [08/Mar/2004:22:07:15 -0500] "GET /_cmdlogin?login=guest&version=enterprise HTTP/1.0" 404 - "-" "http://www.almaden.ibm.com/cs/crawler [c01]"
66.147.154.3 - - [08/Mar/2004:22:07:26 -0500] "GET /se/ HTTP/1.0" 404 - "-" "http://www.almaden.ibm.com/cs/crawler [c01]"

I did do some searching on this site, and it appears the above IP/site was indicated as something that should be banned. I can use .htaccess to ban IP addresses, but it would make more sense to ban the agent's I do not approve of, wouldn't it?

I can put an array of banned agents in a PHP file also, that is always executed from every page, but that may place a bit more load on the server, and possibly affect response time, I don't know.

Is there a "definitive" list of user agents that are banned please?

Thanks,

Peter

[edited by: DaveAtIFG at 4:12 am (utc) on Mar. 16, 2004]
[edit reason] Removed URL [/edit]

4:15 am on Mar 16, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The Close to perfect .htaccess ban list [webmasterworld.com] is a pretty comprehensive list of "bad bots."
5:27 am on Mar 16, 2004 (gmt 0)

10+ Year Member



Hi Dave,

Thanks for the link to that (longish) thread, loads of information there alright. Btw, aren't we allowed to post URL's that are not 'webmasterworld' ones?

Peter

5:13 pm on Mar 16, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Btw, aren't we allowed to post URL's that are not 'webmasterworld' ones?
I wish you hadn't asked that! :) We're going way off topic and the answer isn't simple but...

First, review the TOS [webmasterworld.com], items 13, 20, and 25. Here are a few threads that discuss the issue.
[webmasterworld.com...]
[webmasterworld.com...]

Unfortunately there are no "hard and fast rules" that apply every time... except "Don't post URLs" and that's unrealistic. It basically comes down to a judgment call for each case.

 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month