Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Googlebot re-requesting robots.txt a lot

         

devil_dog

3:03 am on Nov 14, 2008 (gmt 0)

10+ Year Member



For today, ive seen Googlebot activity a little lower than usual ( > 60 requests a min ) - its a big site.

however take a look at this.


[root@some-server ~]# grep Googlebot /var/log/#*$!xx/access.log ¦ grep robots ¦ tail
66.249.71.233 - - [13/Nov/2008:20:54:43 -0600] GET /robots.txt HTTP/1.1 "200" 269 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
66.249.71.233 - - [13/Nov/2008:20:54:47 -0600] GET /robots.txt HTTP/1.1 "200" 269 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
66.249.71.233 - - [13/Nov/2008:20:55:01 -0600] GET /robots.txt HTTP/1.1 "200" 269 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
66.249.71.233 - - [13/Nov/2008:20:55:08 -0600] GET /robots.txt HTTP/1.1 "200" 269 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
66.249.71.233 - - [13/Nov/2008:20:55:13 -0600] GET /robots.txt HTTP/1.1 "200" 269 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
66.249.71.233 - - [13/Nov/2008:20:55:18 -0600] GET /robots.txt HTTP/1.1 "200" 269 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
66.249.71.233 - - [13/Nov/2008:20:55:18 -0600] GET /robots.txt HTTP/1.1 "200" 269 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
66.249.71.233 - - [13/Nov/2008:20:55:20 -0600] GET /robots.txt HTTP/1.1 "200" 269 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
66.249.71.233 - - [13/Nov/2008:20:55:21 -0600] GET /robots.txt HTTP/1.1 "200" 269 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
66.249.71.233 - - [13/Nov/2008:20:55:24 -0600] GET /robots.txt HTTP/1.1 "200" 269 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
[root@some-server ~]#

I havent made any changes to robots.txt for ages, the file is normally fetchable, and doesnt throw any error....

Is this something I should be worried about?

What could possible make googlebot keep on refetching the robots again and again.?

grippo

3:34 am on Nov 14, 2008 (gmt 0)

10+ Year Member



I'm seeing a lot like this in my logs. In my case, I've 301'ed some duplicated urls. Now I see loops from three different IPs, GETing those very same 301's again and again. I certainly can't think about it as a bug. That first came to my mind. After thinking a lot about this, I like to guess it's good, not bad. If it is not a bug, the obvious reason is to make sure your robots.txt doesn't change.

devil_dog

3:54 am on Nov 14, 2008 (gmt 0)

10+ Year Member



well.. my robots.txt is a static file which last changed over 2 weeks ago (added some Sitemaps).

In my case googlebot is getting a response 200 , not 301 .. so this looks strange.

301 loops are common for a while after you set new redirects... but this is a 200 loop ...

kylee

2:16 am on Nov 15, 2008 (gmt 0)

10+ Year Member



Is it possible that the header of robots.txt not configured properly?

[edited by: tedster at 2:17 am (utc) on Nov. 15, 2008]

jdMorgan

2:45 am on Nov 15, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Good point... You should check the Last-Modified, Cache-control, and Expires headers sent by your server along with the robots.txt file, as well as the Content-type and other headers.

Use the Live HTTP Headers add-on for Firefox/Mozilla, or one of the many on-line headers checkers.

kylee's post reminded me of when I last saw this same thing (several years ago), and the reason was due to a server configuration error; My server response headers were telling Google not to cache the robots.txt file for more than a few minutes -- and they complied... :o

Jim

[edited by: jdMorgan at 2:46 am (utc) on Nov. 15, 2008]

kylee

4:12 am on Nov 15, 2008 (gmt 0)

10+ Year Member



Another thing you can do is send a message via the Google Webmaster Console to describe the robots.txt fetching problem. Hopefully the engineers will investigate and identify the real issue.

devil_dog

4:39 am on Nov 15, 2008 (gmt 0)

10+ Year Member



hmm... just rechecked after a while... its still there..

the http headers


me@my-laptop:~$ HEAD http://sub.example.com/robots.txt
200 OK
Connection: close
Date: Sat, 15 Nov 2008 04:33:39 GMT
Accept-Ranges: bytes
ETag: "57e00e-118-4549545197840"
Server: nginx/0.6.32
Content-Length: 280
Content-Type: text/plain; charset=UTF-8
Last-Modified: Sat, 16 Aug 2008 15:19:53 GMT
Client-Date: Sat, 15 Nov 2008 04:33:39 GMT
Client-Peer: xx.xx.xx.xx:80
Client-Response-Num: 1

Ill try notifying via webmaster central. Thanks for the advice

[edited by: tedster at 6:34 am (utc) on Nov. 15, 2008]
[edit reason] de-link the url [/edit]