We have our site's sitemaps split into various ones for different areas of our site. The total number of submitted urls is close to 300,000.
Most of the sitemaps are indexed at about a rate of 80% . I assume this is relatively normal for large numbers of urls.
But one sitemap has 21,000 urls submitted and only 1,200 of those are indexed. This sitemap is full of user profiles. There's half a million in total, so this is a very very small subset; only the ones that have displayed a reasonable amount of activity. Which in turn means the profiles are all linked from various other areas of the site also, so it's somewhat perplexing that they wouldn't be indexed.
What could be the explanation for such poor indexing of these urls? I'm not seeing any crawl errors in WMT related to it.
To be honest, they don't really matter that much to me, because I'm more interested in getting other areas of the site indexed. But the ratio is somewhat perplexing and I'm kind of wondering whether I should make only an even smaller subset open to crawling. Is Google just totally uninterested in indexing user profile pages and flat out ignoring them if they don't have lots and lots of links / content on them?
Is Google just totally uninterested in indexing user profile pages and flat out ignoring them if they don't have lots and lots of links / content on them?
I'm going to agree with you there. They can probably recognize user profiles and those pages are probably very similar in content, assuming your user's profiles are much like the rest of ours, without anything truly original on them.
I've had a look at a good selection of the ones indexed. And they all are from very active members and really packed with content.
The 19,000 or so that weren't indexed would have had a reasonable amount of content on them too - these are the most active 5% of our members. I think the threshold is very high on these pages, certainly not the "2 unique sentences" that Matt Cutts spoke of.
Would it be better for me to really trim down that sitemap to only the top 2000 profiles? Or should I just be happy to let Google figure out whether they want to index them.