Page is a not externally linkable
mbennie - 11:16 am on Sep 11, 2007 (gmt 0)
The site went live 3 weeks ago. G-Bot started grabbing about 1K pages/day after 10 days. It kept on that schedule until a few days ago when it increased to 30K pages/day. I expect that speed will increase once again before too long. The fact is that G-Bot could crawl the entire site in less than a day if it wanted to. G-Bot is very conscious of whether or not it is going to crash a server and from what I have seen it won't take more than about 150 pages/second at peak - but right now its throttled down to 1 page/2 seconds. I suspect it also grabs a small set of pages and checks them for spam/duplicate content before deciding to crawl a site more aggressively. One interesting observation as G-Bot crawls the site: Sitemaps were submitted with all URI's as well as update frequency and priority. G-Bot has focused primarily on 2 types of pages thus far. One is the only page that is updated daily and thus has very unique and fresh content. But the page it seems to be most interested in is a map page with a very low priority and update frequency of never. Is it possible G is using these pages to harvest latitude and longitude data for the subject of the content? G-Bot seems relatively uninterested in the pages that would be searched for most frequently. [edited by: mbennie at 11:20 am (utc) on Sep. 11, 2007]
I have been helping a friend with a new site that has 6.8 million uri's of reasonably original content.