Googlebot Spider

Forum Moderators: open

Message Too Old, No Replies

Googlebot Spider

Jill

12:58 am on Nov 28, 2001 (gmt 0)

We are showing requests from crawl2.googlebot.com on some of our sites. The ip we're showing is 216.239.46.43. I looked up the IP's listed in SEW and did not see this IP listed. Just trying to track the little booger! Is this the Google spider?

Key_Master

1:02 am on Nov 28, 2001 (gmt 0)

Yep, it's the real thing.

Jill

1:15 am on Nov 28, 2001 (gmt 0)

Thanks! Here's another stupid question: I assume that I should be able to track its wanderings by the IP and what it requests? It doesn't look like it's making a full crawl, rather just requesting sporadic pages. Is this common?

mivox

1:17 am on Nov 28, 2001 (gmt 0)

Generally, I see a group of Googlebots, all requesting different seemingly random pages, but when you add all the Googlebots together, it adds up to a pretty full crawl... today I've seen crawl2 through crawl9.

Jill

2:02 am on Nov 28, 2001 (gmt 0)

Thanks, mivox. We have seen a few of them too on different domains, and more than one on a couple. It looks in our logs like they are only requesting the robots.txt (which we do not have) and nothing else. That's bad right? :(

Key_Master

2:23 am on Nov 28, 2001 (gmt 0)

Google doesn't mind if you don't have a robots.txt. Googlebot is just being polite (cough) and is checking to see if there are any files or directories you don't want to have spidered. It does this to comply with the Robots Exclusion Standard [searchengineworld.com].

Jill

1:06 pm on Nov 28, 2001 (gmt 0)

Key Master:

I'm sorry I wasn't clear on what I was asking. I know it's not a bad thing not to have the robots.txt file. I just meant I thought it was bad that THAT was all google seemed to be requesting. I don't see in my logs (access logs I assume) where Google is crawling the rest of my pages.

starec

1:53 pm on Nov 28, 2001 (gmt 0)

Jill, it will probably come back and spider the whole site soon, it always starts the crawling by reading the robots.txt file and then (immediately or a some time later) continues with the rest of pages. It's a hungry little fellow...

Jill

4:51 pm on Dec 3, 2001 (gmt 0)

Okay I'm seeing various googlebots strolling by my sites at this time. Only thing is this is all I can find that they're asking for:
216.239.46.58 - - [02/Dec/2001:17:00:17 -0500] "GET /robots.txt HTTP/1.0" 302 217
216.239.46.58 - - [02/Dec/2001:17:00:17 -0500] "GET / HTTP/1.0" 200 3968

so I take this to mean they're only requesting the robots.txt file and leaving, correct?

Air

4:56 pm on Dec 3, 2001 (gmt 0)

Not quite, the second request is for your index page "GET /" for 3968 bytes

Jill

7:08 pm on Dec 3, 2001 (gmt 0)

Google does this on this particular site and a few others on my server about twice a week. I'd assume that I would see all the other pages in that form if they actually crawl the whole site?