Forum Moderators: open
Until today, I lived by the gospel of the 216s. Now, however, I'd like to formally call it into question based on three things I have observed at my site this morning. First, though, a bit of background.
My site is of the bibliographic variety and has millions of dynamically generated pages. It has a PR of 7 and, on average, gets about 100,000 pages read during the Deep Crawl and 50,000 pages read each month by the Fresh Bots. There was a time when I'd check the IP addresses to establish the difference between the two types of reads but, over time, it became clear that the domain names for the 216s took the form of crawl##.googlebot.com, while the domain names of the 64s took the form of crawler##.googlebot. Thus, I stopped paying attention to the IP addresses and took the domain names as a reliable indicator of what was happening.
With that said, let me now share with you some interesting observations from this morning. It begins with me noticing my site being hammered early this morning. A quick investigation revealed that it was googlebots of the crawl## variety, thus leading me to conclude that the Deep Crawl had begun. After reporting this here and having my claim questioned, I looked deeper and discovered that the IP addresses were of the 64.* variety, which could only mean two things: Google had changed its naming convention, or Google was now using the 64s to assist with the Deep Crawl.
I am inclined to choose the latter of these two possibilities for three reasons. The first is that the timing is right for a Deep Crawl. The second has to do with the intensity of the crawl: whereas Fresh Bots have traditionally maxed out at 1000 pages/hour at my site, the current crawl is 3000+ pages/hour. The third reason has to do with the length of crawl: the Fresh Bots have almost always left after an hour, whereas the current crawl of my site has been going on for several hours now.
In light of the foregoing, I am willing to risk being a heretic by saying that the gospel of the 216s may be false. That said, I will now sit back and wait for all of you to provide me with 101 reasons why I am a fool to say this. Can't wait!
I wondered the same thing a couple of months ago. My site was then being "deepcrawled" by the freshbot (64.*) - as is happening again as I speak. Which made me wonder if the freshbot was taking over some of the tasks of the deepbot.
However, soon after (week or so?), my site was deepcrawled by the 216.* bot.
Beth
Freshbot only crawls pages linked from high PR pages.
Deepbot crawls all pages.
Just look at the pages are being crawled. That will tell you which bot it is. Two different behaviors.
I've only seen deep crawl behavior with 216. Deep crawl usually starts 4 or 5 days after the update begins.
It seems to me that this would be the next logical step for Google - real time updates - and would take quite alot of testing and tweaking. It would also seem to require that freshbot take a more agressive role in crawling and indexing.
Simply by adding the ability to compute page rank on the fly would make freshbot the primary crawler for Google and could make the deepcrawl obsolete.
Just my suspicions...
Of course Google could use their 64.68.* datacentre for the deep crawl. If they ever do, I guess that Google engineers will take bets on how long it takes us all to figure it out.
They are all generic machines that just boot off different images on the network. Just because we lable them as being for one purpose or another, doesn't mean that Google is unable to use them for whatever they want.
See also kendos' post in
[webmasterworld.com...]
or do a search for '64'.
Where do you guys see "crawler99.googlebot.com" vs. "crawl99.googlebot.com"? Is it in your log files?
What I see in my log is "Googlebot/2.1+(+http://www.googlebot.com/bot.html)"
Am I looking in the wrong place?
Beth