homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

Robots.txt Question?
How often does Ink check for the file??

WebmasterWorld Senior Member 10+ Year Member

Msg#: 69 posted 12:05 am on Mar 10, 2002 (gmt 0)

I noticed a few weeks ago that Inktomi was trying to spider /_vti_cnf/ directories within one of our domains.

We have never used front page extensions, but a previous web host put these extensions on their server and somehow added them to everyone's account. At that time, we had open sub-directories (no index page) and I assume that Inktomi picked up these front page extensions by spidering those open directories and finding the front page directories within our regular directories. When this previous host ran a backup, they backed ALL of our current files up and added all of our files and sub-directories to their front page extensions. This created a spidering nightmare...

About one week ago, I wrote a Robots.txt file to ask ALL robots to quit trying to find files in those directories. All search engines have stopped trying to spider pages within these front page directories except, Inktomi.

I went through my server logs and it appears that each IP that Inktomi uses gets their own copy of a web sites Robots.txt file. Several of their IP's have requested and received the robots.txt file and have stopped trying to spider those directories. However, some Inktomi IP's haven't checked for my robots.txt file in several days and they continue to try to spider those non-existent directories.

Does anyone out there know the default number of days an Inktomi IP waits before it request another copy of a robots.txt file? It seems Google, AltaVista, Ask Jeeves and most of the others check for the robots.txt file at least once per day or before they start spidering. However, Inktomi doesn't seem to follow that same pattern. Anyone with any useful information, please post a reply and let me know the Inktomi schedule for updating their robots.txt information. Thank you...

(edited by: MarkHutch at 6:54 pm (utc) on Mar. 26, 2002)



WebmasterWorld Administrator brett_tabke us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

Msg#: 69 posted 1:13 pm on Mar 11, 2002 (gmt 0)

It usually takes 7 to 31 days for Ink to recognize a robots.txt update. Why it's that long isn't known (it sure shouldn't be).


10+ Year Member

Msg#: 69 posted 1:15 pm on Mar 11, 2002 (gmt 0)

Like all search engines. When it recrawls your site it will check for a new robot.txt file. Otherwise it will use the last recorded one.


WebmasterWorld Senior Member 10+ Year Member

Msg#: 69 posted 4:54 pm on Mar 11, 2002 (gmt 0)

Thanks for the replies. It sure seems like they are spending a bunch of time and bandwitdth trying to spider pages that are not there anymore. However, maybe they have their reasons for not checking more often for a robots.txt file. Maybe most people don't use one and they just don't want to waste their time checking too often or something like that...


10+ Year Member

Msg#: 69 posted 4:57 pm on Mar 11, 2002 (gmt 0)

What do you mean by 'checking for pages which are no longer there'? A robot.txt file doesn't contain a list of pages to your site. It just tells the spider which pages you don't want crawled.


WebmasterWorld Senior Member 10+ Year Member

Msg#: 69 posted 5:11 pm on Mar 11, 2002 (gmt 0)

If you'll read my original post you'll see that I have added pages that no longer exist on my server to the robots.txt file. Why waste the spiders time trying to re-crawl pages and only get 404 errors on them?? It's working great for all search engines except for Inktomi. Thanks for the reply...

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved