aakk9999 - 7:13 pm on Jan 8, 2013 (gmt 0)
I suppose the 50.025 pages are due to the 404 page.
I think not. This would indicate that there are pages that are either not returning proper 404/410 response or that you have too many pages that are noindexed, that are redirecting or that you have a duplicate content / thin content issue.
Right now I am in process tidying up a site that should have rougly 2000 URLs indexed, where WMT reported over 80,000 "Not selected" URLs. These were due to:
- server previously returning 200 OK for pages Not Found
- having many URLs with dates in URL that should not have allowed to be indexed in the first place.
At the begining of December we have asked for a change to be implemented to return 410 for all pages that should not have been indexed owing to dates in URL and to return proper 404 response when the page is genuinly not found.
This has resulted in "Not Selected" initially dropping daily at a rate by aproximately 500 URLs/day, and then last week WMT recording a big drop of almost 40,000 URLs from "Not Selected" chart in WMT. After 5 weeks the site is now down to 20,000 "Not Selected" URLs.
From what we can see, it seems that URLs returning 410 are dropped from "Not selected" quicker than URLs returning 404.
I would therefore carefully inspect your URLs, perhaps using "site" command narrowed down by using "inurl" string using some filters, to see where these "Not selected" are coming from. I don't think they are because of 404 errors.