Forum Moderators: Robert Charlton & goodroi
I noticed a day ago, after the spam had been in place for several days (less than a week - Goog has not canned my traffic yet).
I have killed the pages on my site (initially by putting the area affected into "maintenance mode" - so no Spam pages are not being served), and I have rebuilt the site essentially from scratch on a completely new account with the latest version of everything, and will be back up in another 24 hours. I have also temporarily stopped pinging Google with updates.
My question is this: what is the best method of getting the Spam pages out of the Google cache and index?
There are probably too many to do it manually (1000+ generated spam pages out of perhaps 6000 pages on the whole site), but they all have a similar naming format:
www.<domain>.com/?<standard string><random digits>
so I could filter for the ? and the standard string.
They are also distinctive in being cached by Goog, while my policy is to run the site as index, nocache.
Any suggestions will be most welcome.
Rgds
Ferdinand
[edited by: tedster at 6:32 am (utc) on Sep. 3, 2008]
[edit reason] moved from another location [/edit]
these threads may have useful tips to prevent future spam exploitation:
[webmasterworld.com...]
[webmasterworld.com...]
If googlebot continues to request those urls, you could put in a robots.txt rule to stop future requests for those urls. Fortunately you've got a pattern, so one Disallow rule will do it:
User Agent: *
Disallow: /?<standard string>
The only reason I don't say use robots.txt right away is that the spammer may have placed backlinks pointing to those urls from another domain. So you want Google to see that those urls are no longer there. If they were blocked by your robots.txt, then they could hang around for a long time as url-only listings because of backlinks.
But once you see googlebot is getting the 404 responses, then you can place the robots.txt rule and I think Google will sort it all out pretty fast for you.
Maybe someone else can see a way to make a url removal request in a case like this - I'm kind of stumped on that one.
Using the blunt instrument that is the removal tool in this case might be risky, anyway.
Maybe a good time to check your no cache function, however!