Msg#: 3328528 posted 5:50 pm on May 2, 2007 (gmt 0)
You can verify if a googlebot user agent is really from Google - see How to verify Googlebot is Googlebot [webmasterworld.com].
A second factoid in this mix: all of the google spiders now share a crawl cache, which is then used by the algorithm to score the SERPs. So, the spider and the algorithm are two separate steps. The spider just retrieves the pages and stores them in Google's shared crawl cache - then the algorithm ranks them.
Msg#: 3328528 posted 12:58 am on May 3, 2007 (gmt 0)
You can see some "interesting" patterns of fetch if you look at your logs over a period of time.
Msg#: 3328528 posted 12:26 am on May 4, 2007 (gmt 0)
Thanks tedster -- I figured that the bot was only retrieving data, but wanted to confirm from someone with expertise in the field.
g1smd -- am not sure what you mean by "interesting patterns of fetch" -- are you referring to the way the bot moves through a site? That level of analysis is something I've not done yet, so to be honest I may not fully grasp what the pattern would be telling me.
Msg#: 3328528 posted 12:38 am on May 4, 2007 (gmt 0)
Several things: how they request URLs across a site, how often stuff is fetched, how some stuff is fetched more often than others, and so on.
The pattern reveals little or nothing about how things work, but sometimes you can attribute a change of pattern with something that you did to the site content, or internal navigation, or linking pattern.
Msg#: 3328528 posted 12:51 am on May 4, 2007 (gmt 0)
I helped out on another forum that was under attack about six months back. The bot attacking was called something like "GOOGLEBOTRUSSIA", of course I realised immediately it didn't belong to google and blocked it to recover the site but there are spam bots masquerading as google.Its well worth checking.
Msg#: 3328528 posted 4:12 am on May 4, 2007 (gmt 0)
|but there are spam bots masquerading as google. Its well worth checking. |
Tedster told me in another thread that since Google can/does change its IP, it is not dependable to keep a "white" list. But to date -- for me at least -- all the googlebots I see are in the range 66.249.65.xx, 66.249.66.xx, and 66.249.72.xx. So if I see anything calling itself googlebot from outside 66.249, I will be suspicious.
Msg#: 3328528 posted 4:57 am on May 4, 2007 (gmt 0)
If you can employ the method described in this link, you don't need to be suspicious - you can just plain-old know for sure. It also works for slurp, by the way.
How to verify Googlebot is Googlebot [webmasterworld.com].
[edited by: tedster at 11:40 pm (utc) on May 4, 2007]
Msg#: 3328528 posted 11:36 pm on May 4, 2007 (gmt 0)
Google changes IP addresses from time to time, but the range of IP addresses that is available tyo them is well known and in the public domain. They have several very large blocks and a number of smaller allocations.