Welcome to WebmasterWorld Guest from 54.226.147.190

Forum Moderators: goodroi

Message Too Old, No Replies

How often should a robot check the robots.txt file?

Your opinions please

     

jrobbio

1:12 am on May 18, 2003 (gmt 0)

10+ Year Member



With the likes of Freshbot and Grub making updating of sites faster and faster, there is obviously the possibility that the robot will overstep the mark and inadvertantly crawl banned material.

With sites expanding and changing it should be said that there will be many a webmaster not too happy to find unwanted material appearing in the SERPS and striving to find the removal form or ban the bot altogether.

So my question goes out to you is:
How often (in hours, days, bursts or whatever measurement) do you feel it would be appropriate for the robots.txt to be checked?

WarmGlow

1:39 am on May 18, 2003 (gmt 0)

10+ Year Member



How often (in hours, days, bursts or whatever measurement) do you feel it would be appropriate for the robots.txt to be checked?

I would be very happy with a robots.txt refresh in 24 hours or less.

SinclairUser

1:56 am on May 18, 2003 (gmt 0)

10+ Year Member



Jeeze,

I wish I had your problems.

Can't get googlebot to visit no matter what. Even when it comes, it crawls some obscure stuff I don't want crawled.

Send in the bots! - NOW!

jdMorgan

1:56 am on May 18, 2003 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



jrobbio,

If you want to make the robots check more often, set the server Expires header for robots.txt to a shorter time.

I had mine set too short last year, and wondered why the robots checked it before each and every file they requested!

Jim

SinclairUser

1:58 am on May 18, 2003 (gmt 0)

10+ Year Member



JD,

What happens if you have no robots.txt?

Chris.

jrobbio

2:03 am on May 18, 2003 (gmt 0)

10+ Year Member



Thanks jd I didn't know you could do that. However, I would appreciate your input on the question at hand.

jdMorgan

2:07 am on May 18, 2003 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



SinclairUser,

You get a lot of 404 errors from 'bots trying to find it, cluttering up your error logs and hiding real errors!

Other than that, the lack of a robots.txt file is interpreted by robots to mean, "request anything you like."

A good default robots.txt file which allows unlimited access but prevents all those 404s is:


User-agent: *
Disallow:


Follow the "Disallow:" line with one blank line - some obscure old robots require it.

Jim

SinclairUser

2:25 am on May 18, 2003 (gmt 0)

10+ Year Member



JD,

RE: no robots.txt.

Googlebot can crawl into the darkest recesses of my site - just so long as it crawls everything!
How long does it take just to get one decent crawl!

Paid inclusion and PPC looks pretty good from here - in comparison to waiting forever to get indexed!

Chris.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month