Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Googlebot scanning script links I don't have

         

Doood

9:12 pm on Jun 27, 2009 (gmt 0)

10+ Year Member



I cannot figure out why googlebot is constantly using a script on my site to redirect to another site with random search terms in the other site's url. Cuil is doing the exact same thing too.

The other site is apparently an attack site (says firefox) and has popups for about 5 more attack sites.

I get maybe 500 of these everyday

66.249.**.** - - [21/Jun/2009:09:02:26 -0400] "GET /cgi-bin/script.cgi?u=http://othersite/?q=Bad+Adult+Words HTTP/1.1" 302 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

The /cgi-bin/script.cgi is a script I use for tracking clicks on links and I definitely do not have any links for this site. I can't even find any info about it on google.

The other site is a .co.cc and has illegal adult content for sale on it.

For now all I know to do is just block this googlebot IP and hope it doesn't ruin my serp like it did last week when I accidentally blocked it because my GeoIP script said the IP was from Vietnam.

[edited by: tedster at 1:52 am (utc) on June 28, 2009]
[edit reason] obscure the specifics [/edit]

wilderness

10:35 pm on Jun 27, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A requesting bot or visitor does NOT Redirect, rather your site and/or server does.

Possible solution:

#Turn on RewriteEngine (Unless previously open)
RewriteEngine on
#Deny Access from GoogleBot when Refer for "othersite" and refer contains badwords
RewriteCond %{REMOTE_ADDR} ^66\.249\.
RewriteCond %{HTTP_REFERER} othersite
RewriteCond %{HTTP_REFERER} (Bad¦Adult¦Words)
RewriteRule .* - [F]

Please note; Forum breaks pipe characters and they require correction prior to use.

Doood

10:59 pm on Jun 27, 2009 (gmt 0)

10+ Year Member



Googlebot is hitting a script on my site that DOES redirect because it's a script that counts the clicks on my links and that is what it is supposed to do.

Somehow googlebot is placing url's not found on my site in my script and redirecting to those sites.

The problem is that there is no referring url. The hits come in with no referrer, cookies are disabled for it and I can't figure out how to block these clicks to sites I do not link to.

wilderness

11:32 pm on Jun 27, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My apologies.
try this:

#Turn on RewriteEngine (Unless previously open)
RewriteEngine on
#Deny Access from GoogleBot when Refer for "othersite" and refer contains badwords
RewriteCond %{REMOTE_ADDR} ^66\.249\.
RewriteCond %{REQUEST_uri} othersite
RewriteCond %{REQUEST_uri} (Bad¦Adult¦Words)
RewriteRule .* - [F]

Please note; Forum breaks pipe characters and they require correction prior to use.

Doood

11:49 pm on Jun 27, 2009 (gmt 0)

10+ Year Member



Thanks Wilderness.

I may need to do some more research on the rewrite condition request_uri. I hope it's as easy as this rewrite.

g1smd

11:55 pm on Jun 27, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



*** Somehow googlebot is placing url's not found on my site in my script and redirecting to those sites. ***

You need to re-design your script to remove that security flaw.
Your site is being exploited.

Doood

12:04 am on Jun 28, 2009 (gmt 0)

10+ Year Member



Yeah but how do you remove a flaw like this?

Almost all click tracking or ad scripts use this to track clicks.

rainborick

12:42 am on Jun 28, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



One solution is to validate the target URL as being one you to which really intend send users. Then if the requested URL is not valid, return a 404 error. This requires maintaining a database of valid URLs, if you aren't doing so already. An even better solution would be to use an ID number in the query string sent to the script instead of a URL so that outsiders wouldn't be tempted to route traffic through your site.

If you don't mind if the search engines don't follow the links that use this script, you could always block the script in your robots.txt file. This wouldn't stop all of the attacks on the script, though.

dstiles

9:26 pm on Jun 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Redirect exploits are common. If your script is open to third party abuse you will probably find your site in one of the open server blacklists used for anti-spam etc resolution - check out robtex.

The obvious immediate solution is to rename something so the script cannot be abused. It WILL be found again if you do not add other protection to it soon - insistence on a correct referer (ie your site/page) is an obvious one but perhaps better would be an additional header code in the calling script, preferably one that changes every usage in some way that the redirect script could verify.

Also, check to see if the visitor is a real browser and if not terminate the page with a suitable message (eg 403 go away).

Doood

10:41 pm on Jul 1, 2009 (gmt 0)

10+ Year Member



Thanks for all the helpful info. This script is written in C and is encoded so I can't edit it in any way.

It tracks all the redirects and clicks going thru the script and they're all just going to one domain so I blocked that domain like Wilderness suggested and it works.

RewriteCond %{QUERY_STRING} ^u=http://othersite
RewriteCond %{REQUEST_URI} !^/403forbidden\.php$
RewriteRule .* - [F]

The way I see it, even though my script is being exploited by redirecting to this bad site that google is also being exploited too since these "attacks" are coming directly from google and they continue to allow it to happen.

Rosalind

10:21 am on Jul 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The trouble with the .htaccess fix is that new bad domains are registered all the time, and you can't block all of them. The best thing you can do is remove the script immediately, until you can replace it with one that's more secure.

One of the most important security lessons is to check input in terms of exactly what you expect it to be, rather than by excluding a small set of bad patterns. You'll never catch all of those bad patterns, so you have to be as restrictive as possible about what you'll allow your script to redirect to.

g1smd

7:08 pm on Jul 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A few years ago, bad sites were exploiting open 302 redirects to create Duplicate Content issues for other third-party sites. There's less issues if the redirect is a 301, but you need to nail these holes shut now. If you don't the bad guys will invite all their friends out to play. Then you and your site will be in Trouble.