Forum Moderators: goodroi
my robots.txt looks like:
quote:
--------------------------------------------------------------------------------
User-agent: *
Disallow: /cgi-bin/
Disallow: /gallery/
Disallow: /images/
Disallow: /stat_www/
Disallow: /stat_www_old/
Disallow: /survey/
Disallow: /templates/
--------------------------------------------------------------------------------
when I look at the logfile I see strange things: the google bot is regularly visiting and indexing sites but
quote:
--------------------------------------------------------------------------------
this one: 66.196.65.36 - Mozilla/5.0 (Slurp/si; slurp@inktomi.com; [inktomi.com...]
--------------------------------------------------------------------------------
does only come to my site and reads the robots.txt allthe time and then leaves again. Today it read it 10 times and nothing else.
Btw to which chmode do I have to set the robots.txt?
And how do some ppl manage to get a 404 error, when retrieving my robots.txt?
Am I doing something wrong?
[nameofyoursite...]
as in the root of your site, you should not be getting a 404 error via the browser if the file is in this directory
are you seeing a 404 error when googlebot and slurp requests the file
The file must be FTP'd in ASCII not Binary
hope this helps
ncw164x
Welcome to WebmasterWorld [webmasterworld.com]!
To second ncw164x, your robots.txt looks fine. For added reassurance, run it throught this robots.txt validator [searchengineworld.com]
Note that robots.txt Disallow patterns are prefix-matched. That is the robot will not fetch anything that begins with the string you specify after Disallow. Therefore, you can disallow both "/stat_www/" and "/stat_www_old/" using the single directive:
Disallow: /stat_www
Inktomi's Slurp is notoriously slow about digging deeply into sites - You may just have to wait awhile. If your site is commercial and you want it spidered soon and frequently, consider the paid inclusion option.
chmod 644 should be fine - robots.txt is fetched just like any other text file or html page.
Jim
I just got an email from inktomis tech support telling me that they are retrieving the robots.txt from time to time to check the pages are still up. They also told me that they have thousands of indexed pages from my cgi-bin that I banned them from recently because that was a mistake on my side. well, actually they said they have a few pages from my site in their index and if I have more I should link to them.
I checked again and they only have 3 pages and 1000 of old ones from my cgi-bin...
I will just wait a little longer...
66.77.73.162 - FAST-WebCrawler/3.8 (crawler at trd dot overture dot com; [alltheweb.com...]
Date Page Status Referer
01/03 15:01 /robots.txt 404 -
The first time FAST visited me for a long time and it did not find my robots.txt?
Of course I have one and it is ok..