The number of indexed in WMT is significantly higher than expected for years.
One year ago we deleted some spam and created a sitemap using a crawler. This sitemap is still in place.
After that we deleted some of our own pages. At this time 99% of our pages in the sitemap were indexed.
WMT stats today
Indexed pages: 130k (max 155k in february)
Sitemap: 135k
Indexed pages in sitemap: 115k
Crawling is more than healthy. Google crawls 10k pages every day.
Thats leaves me with 15k pages which are not in our sitemap and therefore couldn't be found by our crawler 1 year ago.
New pages are part of this difference, but in the last year we created less than 5k pages, so at least there are 10k indexed pages I can't explain. It's a problem lasting years anyway.
More Details:
One year ago 10k spam pages were injected into our site and we removed them.
We also had a problem on Googles side with the URL parameter handling. It took them 1,5 years to discover and remove 500k(sic) URLs we used to track our ads, although the correct policy was always in place.
URL parameters account for only 2,5k pages according to WMT and are included in our sitemap anyway.
I can't find any traces of unwanted tracking URLs in Google search.
One year ago we tried to wipe out all traces of old URLs and spam. We used 410 and also applied the URL removal tool to all relevant folders, even removing good content to get rid of all remaining unwanted pages.
I can't find any traces of the old and spam pages we removed in Google search.
I can't find any new spam using site:example.com viagra etc.
Widgets on other websites account for only a handfull of indexed pages in SERPs (I should work on that). Even theoretically it can't be more than a few hundred.
Is this normal? What am I missing?