homepage Welcome to WebmasterWorld Guest from 54.197.147.90
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Robots.txt
Is this valid?
Alternative Future




msg:1527613
 3:49 pm on Apr 2, 2003 (gmt 0)

Hi guyz,

Sorry if this question has been asked many times before, but I honestly couldn't find a definite answer, I have checked it against the SEO validator but would prefer to hear it from another human.

Does this part of robots.txt conform to the standards:

User-agent: 216.167.97.169
Disallow: /blah
Disallow: /blah
Disallow: /blah

Many thanks,

-gs

 

Dreamquick




msg:1527614
 3:54 pm on Apr 2, 2003 (gmt 0)

You can't use the an IP address in robots.txt - you need to know the name the robot will look for when it parses robots.txt, also bear in mind that this file is optional rather than server-enforced so there's nothing there that will stop a bad crawler from crawling what they want - irregardless of whether they read that file or not.

Apart from that it looks okay but you might want to throw it through the Robots.txt Validator [searchengineworld.com] just to make sure.

- Tony

Alternative Future




msg:1527615
 3:57 pm on Apr 2, 2003 (gmt 0)

Thanks for the quick response Dreamquick,

I did have the name of the robot in there no harm is saying its name (LinkChecker) but it just doesnt obbey my rules! Last month it hogged nearly 1Gb of bandwidth! Any ideas on how to only allow it access to my links dir?

[added]Also the validator said all was fine, hence the reason i wanted to validate it from another human[/added]

Thx again,

-gs

Dreamquick




msg:1527616
 4:16 pm on Apr 2, 2003 (gmt 0)

If you are on apache .htaccess will let you block the crawler by either UA or IP/IP range, blocking stuff on IIS is less easy but still possible.

Sorry if that sounds a little vague but I'm an IIS + ASP person so Apache isn't my forte! If you'd like to know how to block at an ASP level that I do know :) sticky me if you'd like to chat about that...

In either case a site search for blocking and your server type will produce a list of possible answers.

<added>
LinkChecker seems to be a link checker program (duh @ me), the source is available via sourceforge - it claims to be robots.txt compliant so you might just have someone faking that UA, or perhaps an older version.
</added>

- Tony

Alternative Future




msg:1527617
 9:52 am on Apr 3, 2003 (gmt 0)

Thanks Dreamquick,

Your pointers helped me out no end! My robots.txt now looks like:

User-agent: linksmanager
Disallow: /blah
Disallow: /blah
Disallow: /blah

Once again thanks for your help :)

KR,

-gs

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved