Like many other sites, one of my client’s websites was devastated by Panda 4.0. Since the update the site’s traffic has decreased by over 60%, which has thrown my client into a panicked frenzy. The site itself allows users to upload pictures, in which each picture contains a unique URL. While ideally all of these picture pages as landing pages would generate traffic, even before the update, the main page and the category pages were responsible for the overwhelming majority of organic traffic.
The issue of duplicate or thin content was never a concern of his prior to the update, but a Google site search reveals a series problem. While he has approximately 25,000 pages, Google has over 60,000 in the index. It appears the problem is that many times users will upload images multiple times or images that contain nudity, the client subsequently deletes these posts. Depending on the time of the deletion, Google may crawl these pages and add them to the index. The question I have is whether anyone has a suggestion to expedite the de-indexing process. The deleted posts are not linked to in the website and no longer appear in the sitemap. Consequently, it will be quite a while before Google re-crawls these nearly 40,000 pages. Given the number of pages, submitting individual removal requests through Webmaster Tools is not a feasible option. I was thinking I could create a sitemap that includes these deleted posts, which now include a meta NOINDEX tag, but outside of clicking on every single result in the site search, I do not know a way to obtain all the deleted post URLS.
Was additionally contemplating de-indexing the entire site outside of the main and category pages, as the individual picture pages are created with a random URL, few have titles, and outside of the picture, the page contains no text description. Furthermore as the pictures are uploaded by the user, most do not contain descriptive names or alt text. The client has went through thousands of these pages and added descriptive title tags, but even these pages likely would be considered thin content correct?