First, thank you so much in advance for any assistance you can provide.
I have attempted to pick the most appropriate form for this assistance request.
We are currently experiencing significantly reduced search engine positioning for most (if not all) of the search terms people use to reach us. We believe there are a few factors at play, but this whole thing started with a spammer-infested forum, so some background history may be beneficial for solving this problem.
Some time ago we delinked our support forum due to inactivity and spammers (in retrospect we should have just deleted it). Unfortunately neither the spammers nor Google forgot about it and over time it filled up with an assortment of risqué links. We only noticed months ago when we saw our search positioning had been significantly penalized and, upon logging in to Google Webmaster, we saw our top keywords were things like ‘sex’ and ‘video’ (delightful). To solve this problem we removed the forum, ensured all its contents were returning 404 errors, and waited. About a month passed with little result and our top search terms remained, so we blocked the whole forum through robots.txt (hoping it would drop its contents from Google). After some time that seemed to work, we started ranking well in search engine results again, and the risqué keywords vanished (although the replacement keywords were a little unusual, though at least they were normal words.
Another month passed with little concern, until again, rather quickly, our search engine results plummeted even deeper than before. Again we visit Google Webmaster, check any causes we can imagine, and conclude that there is only one cause that really makes sense. Our website contains about 500 valid documents, and Google Webmaster (thanks to this forum) reports over 2000 404 errors and nearly 5000 pages blocked by robots.txt (all contents of this removed forum). Nothing else on the website has really changed (I’ve been working in the background on some big projects so the site’s actual content has remained quite steady for over a year now). We thought it was a little unusual that all those 404 errors were still being retained and we also noticed that all those pages (now blocked with 404 errors since February) were still in the Google index. It made no sense; we could only imagine that somehow blocking them with robots.txt had caused them to be retained for so long.
It is now two days ago. Our next idea is to unblock the forum and 301 redirect its entire contents back to the home page in hope of purging it and reigning it back in. After additional research (a few hours) we decide on a different idea. We follow Google guidelines, 404 the pages again, reblock them with robots.txt, and use the Google URL Removal Tool to remove the entire directory. Today Google’s actual indexes of those pages has vanished although all the associated 404/blocked notifications remain in Google Webmaster (perhaps for historical reference?). Our search engine ranking has not recovered. We also noticed something else which is curious: our keywords, as reported in Google Webmaster, are derived in strong majority from manufacturer PDF manuals we host on the website rather than from actual content pages.
We know it takes time for some of these things to sort out, but as the current circumstance has a significant impact on our business we want to make sure we’re doing the smartest and most efficient thing we can. I would very much love to hear feedback and thoughts on this matter, our approach, and any other possible approach from folks in this forum.
We have never lost page rank.
Google has not associated our site with malware.
We originally submitted a reconsideration request and were told we are not being penalized—that whatever is happening right now is algorithmic.
What do you think about our current circumstance, solution, and options?
Why are PDF documents determining 90% of our keywords in Google Webmaster?
If relevant, should we block the PDF directory (robots.txt)?