Welcome to WebmasterWorld Guest from 54.167.0.111

Forum Moderators: goodroi

Message Too Old, No Replies

Should I ban this user agent?

     

jehoshua

2:04 am on Mar 16, 2004 (gmt 0)

10+ Year Member



Hi,

Just a few days ago, I noticed 16Mb being downloaded in about 20 minutes, the user agent was RPT-HTTPClient/0.3-3

There wasn't much information about this agent, but I did find something at:

snipped

which mentioned the behaviour of the agent to be 'naughty'. Does anyone know what this means. Is the sipder more of a web downloader or web grabber, and should be banned anyway?

If I ban the agent in robots.txt, there is no guarantee that the agent will follow the rules, is there? That is, I cannot force exclusion that way, but maybe in .htaccess?

Also, another "strange" agent, the web logs as follows:

66.147.154.3 - - [08/Mar/2004:22:06:57 -0500] "GET /robots.txt HTTP/1.0" 200 54 "-" "http://www.almaden.ibm.com/cs/crawler [c01]"
66.147.154.3 - - [08/Mar/2004:22:07:04 -0500] "GET /index.html HTTP/1.0" 404 - "-" "http://www.almaden.ibm.com/cs/crawler [c01]"
66.147.154.3 - - [08/Mar/2004:22:07:15 -0500] "GET /_cmdlogin?login=guest&version=enterprise HTTP/1.0" 404 - "-" "http://www.almaden.ibm.com/cs/crawler [c01]"
66.147.154.3 - - [08/Mar/2004:22:07:26 -0500] "GET /se/ HTTP/1.0" 404 - "-" "http://www.almaden.ibm.com/cs/crawler [c01]"

I did do some searching on this site, and it appears the above IP/site was indicated as something that should be banned. I can use .htaccess to ban IP addresses, but it would make more sense to ban the agent's I do not approve of, wouldn't it?

I can put an array of banned agents in a PHP file also, that is always executed from every page, but that may place a bit more load on the server, and possibly affect response time, I don't know.

Is there a "definitive" list of user agents that are banned please?

Thanks,

Peter

[edited by: DaveAtIFG at 4:12 am (utc) on Mar. 16, 2004]
[edit reason] Removed URL [/edit]

DaveAtIFG

4:15 am on Mar 16, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The Close to perfect .htaccess ban list [webmasterworld.com] is a pretty comprehensive list of "bad bots."

jehoshua

5:27 am on Mar 16, 2004 (gmt 0)

10+ Year Member



Hi Dave,

Thanks for the link to that (longish) thread, loads of information there alright. Btw, aren't we allowed to post URL's that are not 'webmasterworld' ones?

Peter

DaveAtIFG

5:13 pm on Mar 16, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Btw, aren't we allowed to post URL's that are not 'webmasterworld' ones?
I wish you hadn't asked that! :) We're going way off topic and the answer isn't simple but...

First, review the TOS [webmasterworld.com], items 13, 20, and 25. Here are a few threads that discuss the issue.
[webmasterworld.com...]
[webmasterworld.com...]

Unfortunately there are no "hard and fast rules" that apply every time... except "Don't post URLs" and that's unrealistic. It basically comes down to a judgment call for each case.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month