homepage Welcome to WebmasterWorld Guest from 54.211.95.201
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Which robots to exclude
biggles




msg:1526730
 12:21 am on Nov 4, 2002 (gmt 0)

Have been playing with the WebMaster World spider.txt checker and out of interest ran the webmasterworld.com/spider.txt through it. I was surprised at the number of excluded agents. Many seem to be email harvesters and site downloaders, which clearly makes sense.

Do people have a list of "nuisance" agents they suggest should be excluded by default for most sites?

Thanks

 

GaryK




msg:1526731
 12:31 am on Nov 4, 2002 (gmt 0)

Check the site in my profile. One of the files I offer for download is a regularly updated robots.txt. For robots that don't obey robots.txt you can check the [Website Strippers] section of my browscap.ini file for the user agents I consider a nusiance.

[edited by: GaryK at 12:36 am (utc) on Nov. 4, 2002]

Macguru




msg:1526732
 12:32 am on Nov 4, 2002 (gmt 0)

The scripting guys did a very nice job here. Almost perfect! ;)

[webmasterworld.com...]

biggles




msg:1526733
 1:33 am on Nov 4, 2002 (gmt 0)

GaryK - thanks for the feedback, but no website listed in your profile.

Macguru - wow, [webmasterworld.com ] what a thread. I'm not a code jockey and not confident about playing with the htaccess file - am I right thinking you use that for bots that don't respect robots.txt?

Thanks

Macguru




msg:1526734
 1:40 am on Nov 4, 2002 (gmt 0)

Unfortunatly, yes. Bad bots dont care for the robots.txt file.

My robots.txt files are quite basic. I just tell all good bots "*" places not to go. For the rest of creepy crawlers, an extra effort is required.

I guess if you try .htacess on a test site, the scripting folks here will gladly help to set it up.

The local web hosts I recommend to my clients, use it as added value. They run bullet proof servers and regularly update the list of banned bots and IPs.

biggles




msg:1526735
 1:53 am on Nov 4, 2002 (gmt 0)

GaryK - would you please send me a Sticky email with the URL.

Thanks

GaryK




msg:1526736
 1:55 am on Nov 4, 2002 (gmt 0)

Done. :)

biggles




msg:1526737
 1:56 am on Nov 4, 2002 (gmt 0)

Macguru - thanks for the advice. Guess I'll have to be getting my head around htaccess, like it or not. :)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved