Welcome to WebmasterWorld Guest from 220.127.116.11
We have a main sitemap index file with 26 individual sitemap pages. After analyzing our log files, I've noticed a pattern with Google crawling 5 random pages from each of the sitemap pages. Is this normal or does Google normally crawl more pages from each sitemap page?
Other than making each individual page listed within the sitemap pages have unique content and making sure there are no problematic errors on the pages, are there any other factors we should account for which would help us with getting the remaining URLs indexed?
Submit a Sitemap to tell Google about pages on your site we might not otherwise discover.
Getting indexed is straightforward - unless there are serious underlying issues. A single, lowly link is all that's required for Google to find a page, and if it doesn't have that, then there's next to zero chance anyone but the most specific searcher is ever going to find it in results.
To put that another way, my rule of thumb is that if you're struggling to get pages linked internally on your site then the last thing you want is a sitemap to disguise the underlying issues. You need links of value to get Google to crawl your site.
My experience is that Google will not just grab all the content on a sitemap - in fact it seems to have a tendency to apply similar rules to those it uses during the main web crawl. I think this is because most people repeatedly submit every URL on their site via a sitemap - something I believe is wholly unnecessary.
For me, the advantage of a sitemap is usually for a larger, frequently updated site, where you can submit a partial sitemap of pages as soon as they created, and "shape" Google's spidering and indexing of new content. But I think that only works effectively on sites with well-established crawling patterns.
I can't disguise that I have anti-sitemap leanings ;)
Most of the ones I've come across contain errors or links to content that should never be indexed - or alternatively are such a maintenance overhead that it would be difficult to justify their use when Google frequently indexes content with even low quality links in under an hour.