homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

Should I ban this user agent?

 2:04 am on Mar 16, 2004 (gmt 0)


Just a few days ago, I noticed 16Mb being downloaded in about 20 minutes, the user agent was RPT-HTTPClient/0.3-3

There wasn't much information about this agent, but I did find something at:


which mentioned the behaviour of the agent to be 'naughty'. Does anyone know what this means. Is the sipder more of a web downloader or web grabber, and should be banned anyway?

If I ban the agent in robots.txt, there is no guarantee that the agent will follow the rules, is there? That is, I cannot force exclusion that way, but maybe in .htaccess?

Also, another "strange" agent, the web logs as follows: - - [08/Mar/2004:22:06:57 -0500] "GET /robots.txt HTTP/1.0" 200 54 "-" "http://www.almaden.ibm.com/cs/crawler [c01]" - - [08/Mar/2004:22:07:04 -0500] "GET /index.html HTTP/1.0" 404 - "-" "http://www.almaden.ibm.com/cs/crawler [c01]" - - [08/Mar/2004:22:07:15 -0500] "GET /_cmdlogin?login=guest&version=enterprise HTTP/1.0" 404 - "-" "http://www.almaden.ibm.com/cs/crawler [c01]" - - [08/Mar/2004:22:07:26 -0500] "GET /se/ HTTP/1.0" 404 - "-" "http://www.almaden.ibm.com/cs/crawler [c01]"

I did do some searching on this site, and it appears the above IP/site was indicated as something that should be banned. I can use .htaccess to ban IP addresses, but it would make more sense to ban the agent's I do not approve of, wouldn't it?

I can put an array of banned agents in a PHP file also, that is always executed from every page, but that may place a bit more load on the server, and possibly affect response time, I don't know.

Is there a "definitive" list of user agents that are banned please?



[edited by: DaveAtIFG at 4:12 am (utc) on Mar. 16, 2004]
[edit reason] Removed URL [/edit]



 4:15 am on Mar 16, 2004 (gmt 0)

The Close to perfect .htaccess ban list [webmasterworld.com] is a pretty comprehensive list of "bad bots."


 5:27 am on Mar 16, 2004 (gmt 0)

Hi Dave,

Thanks for the link to that (longish) thread, loads of information there alright. Btw, aren't we allowed to post URL's that are not 'webmasterworld' ones?



 5:13 pm on Mar 16, 2004 (gmt 0)

Btw, aren't we allowed to post URL's that are not 'webmasterworld' ones?
I wish you hadn't asked that! :) We're going way off topic and the answer isn't simple but...

First, review the TOS [webmasterworld.com], items 13, 20, and 25. Here are a few threads that discuss the issue.

Unfortunately there are no "hard and fast rules" that apply every time... except "Don't post URLs" and that's unrealistic. It basically comes down to a judgment call for each case.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved