Welcome to WebmasterWorld Guest from 54.226.27.104

Forum Moderators: goodroi

Message Too Old, No Replies

Robots.txt

Is this valid?

     

Alternative Future

3:49 pm on Apr 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi guyz,

Sorry if this question has been asked many times before, but I honestly couldn't find a definite answer, I have checked it against the SEO validator but would prefer to hear it from another human.

Does this part of robots.txt conform to the standards:

User-agent: 216.167.97.169
Disallow: /blah
Disallow: /blah
Disallow: /blah

Many thanks,

-gs

Dreamquick

3:54 pm on Apr 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can't use the an IP address in robots.txt - you need to know the name the robot will look for when it parses robots.txt, also bear in mind that this file is optional rather than server-enforced so there's nothing there that will stop a bad crawler from crawling what they want - irregardless of whether they read that file or not.

Apart from that it looks okay but you might want to throw it through the Robots.txt Validator [searchengineworld.com] just to make sure.

- Tony

Alternative Future

3:57 pm on Apr 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the quick response Dreamquick,

I did have the name of the robot in there no harm is saying its name (LinkChecker) but it just doesnt obbey my rules! Last month it hogged nearly 1Gb of bandwidth! Any ideas on how to only allow it access to my links dir?

[added]Also the validator said all was fine, hence the reason i wanted to validate it from another human[/added]

Thx again,

-gs

Dreamquick

4:16 pm on Apr 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you are on apache .htaccess will let you block the crawler by either UA or IP/IP range, blocking stuff on IIS is less easy but still possible.

Sorry if that sounds a little vague but I'm an IIS + ASP person so Apache isn't my forte! If you'd like to know how to block at an ASP level that I do know :) sticky me if you'd like to chat about that...

In either case a site search for blocking and your server type will produce a list of possible answers.

<added>
LinkChecker seems to be a link checker program (duh @ me), the source is available via sourceforge - it claims to be robots.txt compliant so you might just have someone faking that UA, or perhaps an older version.
</added>

- Tony

Alternative Future

9:52 am on Apr 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks Dreamquick,

Your pointers helped me out no end! My robots.txt now looks like:

User-agent: linksmanager
Disallow: /blah
Disallow: /blah
Disallow: /blah

Once again thanks for your help :)

KR,

-gs

 

Featured Threads

Hot Threads This Week

Hot Threads This Month