Forum Moderators: open
I ask the foregoing questions because I operate a bibliographic site with millions of unique pages and, of course, would love to have each and every one of them indexed by google. The site's been a PR7 at least as long as I've known about PR -- about six months -- but the number of pages varies quite a bit from month to month. And whereas the site used to be subject to a couple of days of mind-numbing attention from googlebot, it now gets three or four days of less intense attention. Thus, I'm beginning to wonder whether googlebot's algorithm involves reading in as many pages as possible during a window of time determined by PR.
If what I'm proposing is crazy, please let me know as I'm contemplating a major hardware upgrade to accommodate googlebot. That said, let me finish this off by saying how incredibly helpful this forum has been for me!
static links and high PR with well spread internal linking seem most important:
[webmasterworld.com...]
I would say a lot of deep linking from external sites to different internal pages within your site would help more than all the external inbound links going to your index page.
I'm interested because the last couple of months in particular, I've encountered a "kinder, gentler" googlebot. It still read about the same number of pages as its "more vicious" predecessor (i.e., about 125,000), but took about twice as long to do so. In response, you might say, who cares so long as the same number are being read? And in return, I might agree were it not for the fact that, until now, the number of pages read was increasing each month. Thus, before I go out and spend money on new hardware to serve up pages to googlebot more quickly, I guess I first wanted to check out whether the googlebot algorithm has changed and the old, more vicious googlebot has been replaced?
THanks in advance for any replies!
vitaplease's comments are, as we would expect, right on the button. Google change how they spider from month to month, so a change in the spidering pattern of your site may be due to them, not you.
I think this is probably one of the most overlooked factors. Especially if you have a large site.
The faster you can serve pages to the bot, the more pages the bot can collect in the amount of time alloted for a given site.
I worked on a project recently where we moved a large site that was on a average box to a new box running dual xeons. The number of pages Google indexed doubled.