Forum Moderators: open

Message Too Old, No Replies

Googlebot Confused?

requesting robots.txt repeatedly

         

Ally_Cat

6:14 pm on Aug 5, 2003 (gmt 0)

10+ Year Member



64.68.87.14 - - [03/Aug/2003:01:58:06 -0700] "GET /robots.txt HTTP/1.0" 200 259 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
64.68.87.43 - - [03/Aug/2003:02:00:48 -0700] "GET /robots.txt HTTP/1.0" 200 259 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
64.68.86.79 - - [03/Aug/2003:02:02:23 -0700] "GET /robots.txt HTTP/1.0" 200 259 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
64.68.86.9 - - [03/Aug/2003:02:05:55 -0700] "GET /robots.txt HTTP/1.0" 200 259 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
64.68.86.54 - - [03/Aug/2003:02:06:53 -0700] "GET /robots.txt HTTP/1.0" 200 259 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
64.68.87.78 - - [03/Aug/2003:02:08:45 -0700] "GET /robots.txt HTTP/1.0" 200 259 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

I've never seen this before - anyone experience similar requests?

ciml

6:26 pm on Aug 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google have many Googlebot machines, so this looks like a bunch of different Googlebots each being polite and checking /robots.txt before crawling your site.

JonR28

6:26 pm on Aug 5, 2003 (gmt 0)

10+ Year Member



Not an expert but I think every different IP that hits you needs to look at the robots.txt because they are different machines. Notice each one has a different IP. Thats my guess.

Ally_Cat

6:33 pm on Aug 5, 2003 (gmt 0)

10+ Year Member



This happened over and over again all weekend. I wonder why I was getting hit so heavily, by so many different bots? I had about 6 times as many robots.txt requests as requests for any other page.

JonR28

6:39 pm on Aug 5, 2003 (gmt 0)

10+ Year Member



Maybe google is doing some research on Robots.txt, anyone else having similar symptoms?

dragonlady7

7:05 pm on Aug 5, 2003 (gmt 0)

10+ Year Member



Seen it all over the place. Lots of people wondering why Googlebot was requesting robots.txt over and over again. Often that was the *only* file it was requesting. I finally added one, and then it spidered my site. Others reported similar experiences.
This was a month ago, though.
*shrug*

Friday

7:12 pm on Aug 5, 2003 (gmt 0)

10+ Year Member



For the past two days Googlebot has visited me repeatedly grabbing "robots.txt" _249 TIMES!_ every 2-10 minutes for several hours straight -- and only from TWO different IPS.

This is a new site, recently submitted. It hasn't yet visited any other pages.

BTW: Here's what my "robots.txt" file looks like. Anyone see an error I'm missing?

User-agent: *
Disallow: /inc/
Disallow: /images/
Disallow: /.status/
Disallow: /reports/

bcolflesh

7:16 pm on Aug 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I had the exact behavior with the site in my profile, but I just changed the robots.txt, so that's what I attributed it to.

jdMorgan

7:40 pm on Aug 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This sounds like it's probably a spider bug, but if you are getting multiple robots.txt requests from the *same* IP address(es) it would be a good idea to check [webmasterworld.com] to see if your server is properly configured to return correct Cache-Control, Expires and Last-Modified headers in the server response.

They should look something like this:

Cache-Control: must-revalidate, max-age=7200 
Expires: Tue, 05 Aug 2003 21:37:58 GMT
Last-Modified: Sat, 02 Aug 2003 05:40:41 GMT

Jim

Friday

8:39 pm on Aug 5, 2003 (gmt 0)

10+ Year Member



JDMorgan,
Thanks for the reply.
But how do I check this?
I'm on a shared virtual host.

ADDED: Never mind, I saw your link.
8-P
Thanks Again,
Friday

Friday

8:47 pm on Aug 5, 2003 (gmt 0)

10+ Year Member



jdMorgan,

Here is the reponse I get testing index.html
and robots.txt, respectively:

HTTP/1.1 200 OK
Date: Tue, 05 Aug 2003 20:44:05 GMT
Server: Apache/1.3.26 (Unix) AuthMySQL/2.20 PHP/4.1.2 mod_gzip/1.3.19.1a mod_ssl/2.8.9 OpenSSL/0.9.6g
X-Powered-By: PHP/4.1.2
Connection: close
Content-Type: text/html

HTTP/1.1 200 OK
Date: Tue, 05 Aug 2003 20:46:14 GMT
Server: Apache/1.3.26 (Unix) AuthMySQL/2.20 PHP/4.1.2 mod_gzip/1.3.19.1a mod_ssl/2.8.9 OpenSSL/0.9.6g
Last-Modified: Tue, 05 Aug 2003 20:23:29 GMT
ETag: "9284f-55-3f301241"
Accept-Ranges: bytes
Content-Length: 85
Connection: close
Content-Type: text/plain

What doesd this mean?

Thanks,
Friday

Jenstar

8:58 pm on Aug 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Friday, are you running AdSense? The AdSense bots seem to be in the 64.68.87.x range. The useragent would identify itself as Mediapartners-Google. I thought I'd mention it incase some of the requests are also coming from Mediapartners-Google, since you only listed a few of the accesses.

I find that this one requests robots.txt nearly every time, and there are often several on my site at a time.

jdMorgan

9:10 pm on Aug 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Friday

> What does this mean?

Sticking to robots.txt, it means that Google is free to apply a default max-age and Expires date to your robots.txt file, because you have not specified otherwise. This is usually not a problem. The problem comes in where the webmaster specifies an unusually short max-age or the Expires date is only a short time in the future. In this case, Google detects that the robots.txt it cached earlier has expired, and so needs to be re-fetched before they proceed with further spidering.

Since you didn't specify these times, you don't need to worry about it unless you change your robots.txt a lot.

Jim

Friday

12:52 am on Aug 6, 2003 (gmt 0)

10+ Year Member



jdMorgan:
THanks!

jenstar:
No AdSense.
These are the Googlebot IPs that have visited:

64.68.86.9
64.68.86.54
64.68.86.79
64.68.86.153
64.68.86.161
64.68.87.14
64.68.87.43
64.68.87.78

and visited... and visited...