homepage Welcome to WebmasterWorld Guest from 54.234.59.94
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
How often should a robot check the robots.txt file?
Your opinions please
jrobbio




msg:1525656
 1:12 am on May 18, 2003 (gmt 0)

With the likes of Freshbot and Grub making updating of sites faster and faster, there is obviously the possibility that the robot will overstep the mark and inadvertantly crawl banned material.

With sites expanding and changing it should be said that there will be many a webmaster not too happy to find unwanted material appearing in the SERPS and striving to find the removal form or ban the bot altogether.

So my question goes out to you is:
How often (in hours, days, bursts or whatever measurement) do you feel it would be appropriate for the robots.txt to be checked?

 

WarmGlow




msg:1525657
 1:39 am on May 18, 2003 (gmt 0)

How often (in hours, days, bursts or whatever measurement) do you feel it would be appropriate for the robots.txt to be checked?

I would be very happy with a robots.txt refresh in 24 hours or less.

SinclairUser




msg:1525658
 1:56 am on May 18, 2003 (gmt 0)

Jeeze,

I wish I had your problems.

Can't get googlebot to visit no matter what. Even when it comes, it crawls some obscure stuff I don't want crawled.

Send in the bots! - NOW!

jdMorgan




msg:1525659
 1:56 am on May 18, 2003 (gmt 0)

jrobbio,

If you want to make the robots check more often, set the server Expires header for robots.txt to a shorter time.

I had mine set too short last year, and wondered why the robots checked it before each and every file they requested!

Jim

SinclairUser




msg:1525660
 1:58 am on May 18, 2003 (gmt 0)

JD,

What happens if you have no robots.txt?

Chris.

jrobbio




msg:1525661
 2:03 am on May 18, 2003 (gmt 0)

Thanks jd I didn't know you could do that. However, I would appreciate your input on the question at hand.

jdMorgan




msg:1525662
 2:07 am on May 18, 2003 (gmt 0)

SinclairUser,

You get a lot of 404 errors from 'bots trying to find it, cluttering up your error logs and hiding real errors!

Other than that, the lack of a robots.txt file is interpreted by robots to mean, "request anything you like."

A good default robots.txt file which allows unlimited access but prevents all those 404s is:

User-agent: *
Disallow:


Follow the "Disallow:" line with one blank line - some obscure old robots require it.

Jim

SinclairUser




msg:1525663
 2:25 am on May 18, 2003 (gmt 0)

JD,

RE: no robots.txt.

Googlebot can crawl into the darkest recesses of my site - just so long as it crawls everything!
How long does it take just to get one decent crawl!

Paid inclusion and PPC looks pretty good from here - in comparison to waiting forever to get indexed!

Chris.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved