Welcome to WebmasterWorld Guest from 54.167.157.247

Forum Moderators: goodroi

Message Too Old, No Replies

Robots.txt Question?

How often does Ink check for the file??

   
12:05 am on Mar 10, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I noticed a few weeks ago that Inktomi was trying to spider /_vti_cnf/ directories within one of our domains.

We have never used front page extensions, but a previous web host put these extensions on their server and somehow added them to everyone's account. At that time, we had open sub-directories (no index page) and I assume that Inktomi picked up these front page extensions by spidering those open directories and finding the front page directories within our regular directories. When this previous host ran a backup, they backed ALL of our current files up and added all of our files and sub-directories to their front page extensions. This created a spidering nightmare...

About one week ago, I wrote a Robots.txt file to ask ALL robots to quit trying to find files in those directories. All search engines have stopped trying to spider pages within these front page directories except, Inktomi.

I went through my server logs and it appears that each IP that Inktomi uses gets their own copy of a web sites Robots.txt file. Several of their IP's have requested and received the robots.txt file and have stopped trying to spider those directories. However, some Inktomi IP's haven't checked for my robots.txt file in several days and they continue to try to spider those non-existent directories.

Does anyone out there know the default number of days an Inktomi IP waits before it request another copy of a robots.txt file? It seems Google, AltaVista, Ask Jeeves and most of the others check for the robots.txt file at least once per day or before they start spidering. However, Inktomi doesn't seem to follow that same pattern. Anyone with any useful information, please post a reply and let me know the Inktomi schedule for updating their robots.txt information. Thank you...

(edited by: MarkHutch at 6:54 pm (utc) on Mar. 26, 2002)

1:13 pm on Mar 11, 2002 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



It usually takes 7 to 31 days for Ink to recognize a robots.txt update. Why it's that long isn't known (it sure shouldn't be).
1:15 pm on Mar 11, 2002 (gmt 0)

10+ Year Member



Like all search engines. When it recrawls your site it will check for a new robot.txt file. Otherwise it will use the last recorded one.
4:54 pm on Mar 11, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the replies. It sure seems like they are spending a bunch of time and bandwitdth trying to spider pages that are not there anymore. However, maybe they have their reasons for not checking more often for a robots.txt file. Maybe most people don't use one and they just don't want to waste their time checking too often or something like that...
4:57 pm on Mar 11, 2002 (gmt 0)

10+ Year Member



What do you mean by 'checking for pages which are no longer there'? A robot.txt file doesn't contain a list of pages to your site. It just tells the spider which pages you don't want crawled.
5:11 pm on Mar 11, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you'll read my original post you'll see that I have added pages that no longer exist on my server to the robots.txt file. Why waste the spiders time trying to re-crawl pages and only get 404 errors on them?? It's working great for all search engines except for Inktomi. Thanks for the reply...
 

Featured Threads

Hot Threads This Week

Hot Threads This Month