homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

Webmasterworld's Robots.txt
A couple of questiosn for Brett

 2:13 pm on Dec 27, 2002 (gmt 0)

Brett, if you've got the time:

1. Why ban Googlebot-Image? Is it just to save time because this is a text-based site?

2. Why ban User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows NT) and other such browser identification strings? They aren't spidering, are they?

3. I can see why you disallow Forum 9 (Foo). But why Forums 19 (Community Center) and 29 (Commercial exchange)?

4. Are all the bots listed well-behaved or do you also have to ban some of them via rewrites?




 3:10 pm on Dec 27, 2002 (gmt 0)

1- what possible use could an image bot do for us other than use bandwidth?

2- some bots do support robots.txt and do use standard agents. It catches a few.

3- off topic referrals and local related discussions.

4- I spend an hour to two hours a day tracking down bad bots and banning them, their ip, or there entire isp. It isn't bad when you have a few dozen or few hundred pages; but, when you get into 70-80k pages, bots can tear a system up in no time flat without sufficient tracking and response systems. It is a very serious problem that threatens the entire system on a daily basis. We are having a discussion about moving to a subscription based system and stopping rogue bots would be my #1 reason for doing so.


 4:25 pm on Dec 27, 2002 (gmt 0)

Thanks for the info, Brett. It must be endlessly frustrating!

Community Center has quite a few on-topic threads. Maybe they've started in the wrong place and a moderator will move them to a more relevant forum.

Is subscription-only the best way to stop bots? (I'll be brief here as it is obviously being discussed at length elsewhere). A simple mechanism to trip them up at the doorway might be a sign-on screen which has human-readable instructions how to sign on as a guest -- such as typing in a displayed random string.

I was looking for a complete bot ban list (not that I have the problems you do -- I'm not that successful, and what you have should make you happy -- it's a problem of success) so I'm pleased to see that the comments in your robots.txt let other people use it freely.


 5:00 pm on Dec 27, 2002 (gmt 0)

Well, the local cat here does sometimes get personal stuff posted in it and I noticed some of the referrals were way off our topic area here.

Sure, you can use the robots.txt anywhere.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved