Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Removing Spam Pages from Google Cache/Index

         

ferdinand2000

11:29 pm on Sep 2, 2008 (gmt 0)

10+ Year Member



I have a problem with spam pages, which look as if they have been added to one of my sites through an exploit in a Wordpress install that I was not quick enough to update to the latest version.

I noticed a day ago, after the spam had been in place for several days (less than a week - Goog has not canned my traffic yet).

I have killed the pages on my site (initially by putting the area affected into "maintenance mode" - so no Spam pages are not being served), and I have rebuilt the site essentially from scratch on a completely new account with the latest version of everything, and will be back up in another 24 hours. I have also temporarily stopped pinging Google with updates.

My question is this: what is the best method of getting the Spam pages out of the Google cache and index?

There are probably too many to do it manually (1000+ generated spam pages out of perhaps 6000 pages on the whole site), but they all have a similar naming format:

www.<domain>.com/?<standard string><random digits>

so I could filter for the ? and the standard string.

They are also distinctive in being cached by Goog, while my policy is to run the site as index, nocache.

Any suggestions will be most welcome.

Rgds

Ferdinand

[edited by: tedster at 6:32 am (utc) on Sep. 3, 2008]
[edit reason] moved from another location [/edit]

phranque

6:56 am on Sep 3, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld [webmasterworld.com], Ferdinand!

these threads may have useful tips to prevent future spam exploitation:
[webmasterworld.com...]
[webmasterworld.com...]

tedster

6:59 am on Sep 3, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hello Ferdinand, and welcome to the forums.
I understand that there's no easy way to submit all those urls, since they're not in one directory but only rely on the query string. If you make sure that those urls all result in a 404 or 410 http status, Google should drop them pretty fast.

If googlebot continues to request those urls, you could put in a robots.txt rule to stop future requests for those urls. Fortunately you've got a pattern, so one Disallow rule will do it:

User Agent: *
Disallow: /?<standard string>

The only reason I don't say use robots.txt right away is that the spammer may have placed backlinks pointing to those urls from another domain. So you want Google to see that those urls are no longer there. If they were blocked by your robots.txt, then they could hang around for a long time as url-only listings because of backlinks.

But once you see googlebot is getting the 404 responses, then you can place the robots.txt rule and I think Google will sort it all out pretty fast for you.

Maybe someone else can see a way to make a url removal request in a case like this - I'm kind of stumped on that one.

Quadrille

11:23 am on Sep 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'd not make a removal request - mainly because you shouldn't need to. The chances of your potential visitors finding those pages is small, and so long as you have a user-friendly 404, you've covered all bases, if you've followed the advice above.

Using the blunt instrument that is the removal tool in this case might be risky, anyway.

Maybe a good time to check your no cache function, however!