Forum Moderators: phranque
My site is an .asp site and has Google Site Search installed for internal searching. A few weeks ago I got an email from Google Webmaster Tools saying Googlebot found an extremely high number of URLs on your site.
So I checked Webmaster Tools and found over 6,000 "Not Found" URLs that all originate from my Search URL, i.e. mysite.com/search?q=torrent+keywords and mysite.com/Search/?q=more+torrent+keywords
Now I don't see how this would benefit the attacker in any way. However it has damaged my site because if you search for example "mysearch #*$!" in Google, one of these search result URLs would come up 1st!
I have tried adding a rel="noindex" tag and removing the attacked URL from the Google Index because that's the least I can do about it.
But just last week, the attack happened again, and there's now over 2,000 "Not Found" URLs. This time the attacker targets my root domain! i.e. mysite.com/advanced_search?q=even+more+torrent+keywords mysite.com/preferences?q=borat+britney+whatnot
Those URLs don't physically exist but they are still recorded by Google Webmaster Tools as the site URL.
So my questions are:
1. What on earth has happened? What is it that the attacker is trying to do?
2. What is the worst thing that can happen to my site?
3. What is the best way to stop and prevent this attack for good?
4. What settings might I have set wrong on my site that have allowed this to happen?
Thanks guys.
User-agent: *
Disallow: /search
Disallow: /advanced_search
[Disallow" TARGET="_top" title="http://www.robotstxt.org/orig.html#formatDisallow">robotstxt.org
...]
The value of this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that starts with this value will not be retrieved. For example, Disallow: /help disallows both /help.html and /help/index.html, whereas Disallow: /help/ would disallow /help/index.html but allow /help.html.
Kaled.
However I'm more interested in preventing similar attacks in the future. The IT team is not the most agile (big corporate wheel...) so I need to find prevention tactics rather than countering.
Even if I've noindexed the known URLs and removed them from the Google index, they can attack again with new variations of the URL e.g. mysite.com/blabla2?q=random+keywords, mysite.com/blabla3?q=random+keywords, etc.
How can I prevent this from happening?
It is reasonable in this case to use cloaking, so that you deliver one error page to search engine spiders and another error page to users. However, the method by which you detect whether an url is genuine or not depends on how your site works - I can't help you there.
Kaled.