Hi,
I manage a website with 676k URLs in the sitemaps. On Feb 21, there were 671k URLs indexed (according to Google Search Console > Page indexing), and 937k URLs not indexed. These 937k URLs were old webpages that I've been removing from index, after consolidating the content (681k URLs with redirect) or after removing it (127k with 'noindex'). The rest were 'Crawled - currently not indexed' (43k) or 'Discovered - currently not indexed' (78k).
The number of indexed pages has been decreasing until May 2 (646k). Then, on May 3 the number of indexed pages dropped to 612k, and now it is 583k.
- The number of redirected page has increased slightly until now (from 681k to 688k)
- The number of 'noindex' pages has decreased slightly until now (from 127k to 122k)
- The number of 'Crawled - currently not indexed' boosted (On May 2, passed from 43k to 80k, and now it is 99k)
- The number of 'Discovered - currently not indexed' has increased significantly (from 78k to 91k)
As far as I understand, on May 2 Google definitely penalized my website because Google understands that I have no quality content, is it right?
I've got a website created from a large database (600k records) of commercial products, a site that may be considered as 'thin content'. For the last years, I've been removing (noindexing) the content with low quality (products with short alphanumeric data, that generated short pages with duplicate content), and consolidating (by redirections) different webpages of the same products. I have been taking care not to have duplicate content, and to generate quality content for users from the database records. I've got hundreds of links from top domains included 'nytimes.com' or 'huffpost.com', and thousands of links from Wikipedia.
I would appreciate some tips to cope with this issue. Thank you very much.