|Y! and robots.txt|
| 1:23 pm on May 6, 2008 (gmt 0)|
My robots.txt file is more than a year old and has always been respected.
Lately I added some *.swf files to a disallowed directory and Y! goes after them. This directory also holds *.jpg and *.gif files which it ignores.
What does Y! not understand in :
Anyone else seen this before ?
| 1:52 pm on May 6, 2008 (gmt 0)|
the code is fine.. then whats wrong with Y!.. hey sfaffa.. I have been caught in the same condition and found zlich after experimenting robots.txt file... these days crawler make no sense while reading a robots.txt...
| 2:09 pm on May 6, 2008 (gmt 0)|
Slurp, like all robots, will obey the first robots.txt record that applies to its User-agent string. So be sure that you don't have any other records specific to Slurp, because if you do, it will honor only the first one.
Also, make sure the syntax of your file is 100% correct; All comments on separate lines starting with "#", and one and only one blank line after each record (including the last one).
Also, it's possible that Slurp has not yet processed your new robots.txt file. I prefer to post a new robots.txt file at least 24 hours before adding any content that I don't want spidered.
None of this may be applicable -- Just taking some guesses based on what you posted.
I have noticed that Slurp tries to fetch indexes for directories in which it has found content. That is, if it finds a link to /pages/foo.html, it may try to fetch /pages/. On Apache servers with "Options -Indexes" set, this results in a 403-Forbidden response. Similarly-configured IIS servers probably do the same. However, Slurp does seem to honor robots.txt even when doing this -- I've only seen it when fetching pages in that directory is allowed, and the only strange thing about it is that it's trying to fetch a directory index which is not linked-to anywhere on the Web.
| 4:16 pm on May 6, 2008 (gmt 0)|
Thank you Jim, as always, a most explicit reply.
|...you don't have any other records specific to Slurp, because if you do, it will honor only the first one. |
No mention of Slurp in the entire robots.txt file
|the syntax of your file is 100% correct |
checked and correct
|...Slurp has not yet processed your new robots.txt file |
File date : # Last Updated: 19/05/2007
Since then no new rules added to the robots.txt file, only new *.swf files added to the restricted directory
|...if it finds a link to /pages/foo.html, it may try to fetch /pages/ |
The directory /images/ contains no "text" files, ie html, asp, etc only image files, jpg, gif and swf which are displayed on pages outside that directory and it's only the *.swf type of files that Y! has fetched several times now.
| 6:36 pm on May 6, 2008 (gmt 0)|
Have you checked your robots.txt file by using Google webmaster tools? Try another thing, create a new robots.txt file and place in a root. Wait for a day, and see the results.