I run a website with about 4 million pages <snip>. Although the Google spiders are very active and have been pulling 100,000+ pages per day for the last 3 months, few pages show up on Google. See <snip>. Google indexing of this site essentially collapsed in January 2005 when the number of pages was increased from about 1 million to 4 million.
AskJeeves, on the other hand, indexes 95% of the site.
My current working hypothesis as to why these pages don't show up on Google centers on Google's repetitive pulling of pages to test stability and refresh its indexes. Suppose Google has to be able to pull the same page twice over a two week period before it posts to the index. Suppose also that Google has a maximum pull rate per site. Also, suppose that Google expires pages after a month. With more than 4 million pages, Google cannot do repeat pulls fast enough to keep the pulled pages in the index.
Does this make sense to anyone intimately familiar with Google indexing? If this hypothesis is correct, is there a way to get Google to ease the repeatability requirements?
[edited by: engine at 4:40 pm (utc) on Jan. 6, 2006] [edit reason] See TOS [webmasterworld.com] [/edit]