googlebot and log files problem - Crawler, Spider, and User Agent ID forum at WebmasterWorld - WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

googlebot and log files problem

what do you think about these logs?

macdar

5:22 pm on May 23, 2005 (gmt 0)

10+ Year Member

Hi,
I've got two websites: One of them is pretty much established, however not in google:(
Googlebot visits me quite frequently though, but I can't find "myself" in SERP's..
I've taken a closer look at my logs.. Here's a sample of the googlebot activity:

66.249.64.58 - - [17/May/2005:11:12:59 -0400] "GET /robots.txt HTTP/1.0" 403 - "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"

66.249.64.58 - - [17/May/2005:11:12:59 -0400] "GET / HTTP/1.0" 304 - "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"

66.249.64.79 - - [17/May/2005:11:13:11 -0400] "GET /robots.txt HTTP/1.0" 403 - "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"

66.249.64.79 - - [17/May/2005:11:13:12 -0400] "GET / HTTP/1.0" 304 - "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"

66.249.64.79 - - [17/May/2005:11:13:12 -0400] "GET / HTTP/1.0" 200 5971 "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"

66.249.64.58 - - [17/May/2005:11:22:55 -0400] "GET /robots.txt HTTP/1.0" 403 952 "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"

66.249.64.58 - - [17/May/2005:11:22:55 -0400] "GET / HTTP/1.0" 200 7035 "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"

and so on..

I'm new to all that stuff, but I can read that googlebot is asking for a robots.txt file, which is not on my server. First question: does it hurt me? Do I need to create one, at least an empty file?

And a second thing: the part "GET / HTTP/1.0" - means that he's asking about the root directory "/" right? Why he's not going deeper(?), like "GET /mysite.html HTTP/1.0", or something like that.. Am I missing something here?

And my primary website is at subdomain.domain.com (that address resides in DMoz for about a year) - can I somehow "encourage" googlebot to crawl that page?

How can I check if the "if modified header" is supported on my server?

What do you think about these logs?

Thanks.

Span

6:27 pm on May 23, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

.. GET /robots.txt HTTP/1.0" 403 ..

403 means Forbidden. That's not good. If you don't have a robots.txt a 404 Not Found should be served. Why is your server returning a 403?

macdar

6:34 pm on May 23, 2005 (gmt 0)

10+ Year Member

Good point. I don't have an idea why it says forbidden, if I don't have that file..
I'm gonna ask my hosting company..

What do you think about the fact that it crawls just the root dir?

Thanks.

macdar

6:45 pm on May 23, 2005 (gmt 0)

10+ Year Member

I had something like this in my httpd.conf file:

<Files ~ "^\.robots\.txt">
Order allow,deny
Deny from all
Satisfy All
</Files>
I set it up to prevent people viewing my robots.txt file.
I've removed it already. Thanks for poiting that out.

Coming back to my question:
is this a regural googlebot's activity to ask just about a root directory "/" and a robots.txt file?

Span

7:03 pm on May 23, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Ah.. okay..
If there are enough sites linking to yours Googlebot will crawl all your pages. Now the 403 is gone, just wait and see.

macdar

7:30 pm on May 23, 2005 (gmt 0)

10+ Year Member

Ok, Thanks.

Chico_Loco

2:01 am on May 24, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

66.249.64.58 - - [17/May/2005:11:12:59 -0400] "GET /robots.txt HTTP/1.0" 403 - "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"

66.249.64.58 - - [17/May/2005:11:12:59 -0400] "GET / HTTP/1.0" 304 - "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"

66.249.64.79 - - [17/May/2005:11:13:11 -0400] "GET /robots.txt HTTP/1.0" 403 - "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"

66.249.64.79 - - [17/May/2005:11:13:12 -0400] "GET / HTTP/1.0" 304 - "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"

66.249.64.79 - - [17/May/2005:11:13:12 -0400] "GET / HTTP/1.0" 200 5971 "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"

66.249.64.58 - - [17/May/2005:11:22:55 -0400] "GET /robots.txt HTTP/1.0" 403 952 "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"

66.249.64.58 - - [17/May/2005:11:22:55 -0400] "GET / HTTP/1.0" 200 7035 "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"

These are fine.

Response codes:

200: Download a fresh copy of page
304: Not modified since last time I was here.
403: Not authorized.

macdar

7:47 pm on May 24, 2005 (gmt 0)

10+ Year Member

Thanks Chico_Loco for explaining those server responses.

However I'm still a bit concerned. Don't you see anything weird in the fact that it doesn't crawl beyond the root directory("GET / HTTP/1.0")?
Why it's not hitting other pages?

guitaristinus

11:45 pm on May 24, 2005 (gmt 0)

10+ Year Member

macdar,

It is not weird. I've seen Googlebot do this to my sites. Like it's scouting to see if site is really there before it comes out in force. But I've never given Googlebot a 403.