Forum Moderators: open

Message Too Old, No Replies

Googlebot ignored robots.txt entry

Still the trailing slash problem from last year?

         

Yidaki

2:18 pm on Aug 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Googlebot never had a problem understanding my robots.txt files and obeying the rules ... until today. Yes, they validate [searchengineworld.com] 100%. They are running unchanged since > 1 year. Today Googlebot disobeyed the robots.txt files of three different servers and tried to crawl ~10 disallowed pages per server:

64.68.83.130 - - [12/Aug/2004:13:48:28] "GET http[b]:[/b]//www.example.com/trap/timestamp.htm" 403 0 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

IP's:

64.68.83.22
64.68.83.72
64.68.83.130
64.68.83.131
64.68.83.135
64.68.83.160
64.68.83.175

Robots.txt:

User-agent: *
Disallow: /trap/

The fetched url's are from September 2003 - February 2004 (dynamic timestamp.htm).

Still the trailing slash problem from last year [webmasterworld.com]?

Brett_Tabke

11:20 am on Aug 13, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Google interps the robots.txt to mean: do not INDEX the files. Most webmasters interp it to mean do not CRAWL them.

Google routinely crawls pages blocked by robots.txt. (presumably looking for links).

If they index them and show them in serps, then there is a problem.