Welcome to WebmasterWorld Guest from 54.205.96.97

Forum Moderators: Ocean10000 & incrediBILL

fake google bot?

   
5:08 pm on Dec 31, 2006 (gmt 0)

5+ Year Member



I hope this is the right place to post this?

I have been swamped with error URL's (7844) I noticed this in my sitemaps account on webmaster central, These are all URL's which are not on my site, somehow googlebot is following wrong links.

Looking at my logs I see that they are all coming from Googlebot/2.1, having checked around and I gather that this is a phoney!

After looking around I see that I have to place an htaccess file with a line like this to block this bot :

RewriteCond %{HTTP_USER_AGENT} ^Googlebot/2.1 [NC,OR]
RewriteRule /*$ [site-you-are-sending-the-bot-to.com...] [L,R]

I dont really have an idea where I should send the bot to

http://www.site-you-are-sending-the-bot-to.com
can somebody point me in the right direction please.
7:05 pm on Jan 1, 2007 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



can somebody point me in the right direction please.

Redirecting these to amother website (even the URL they came from) is a BAD practice.

The most effective solution to offer on your end is denial of access (403).

You have not provided an IP range or for these visits?
Nor have you provided a full UA?

The real Google could be chasing intentional errors to verify 404's, however the quanity you provided (7844) is far to many for a solitary website (unless your referring to a large time frame as opppsed to a day or a week or a month?

If these visits "are" coming from a FAKE Google the most effective practice is denial.

Don

11:19 pm on Jan 1, 2007 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



RewriteCond %{HTTP_USER_AGENT} ^Googlebot/2.1 [NC,OR]

Blocking this UA will also block the 'real' GoogleBot.

Better to block by IP address:

RewriteEngine On
RewriteCond %{REMOTE_ADDR} ^##\.###\.##\.##$ [OR]
RewriteCond %{REMOTE_ADDR} ^##\.###\.##\.##$ [OR]
RewriteCond %{REMOTE_ADDR} ^##\.1###\.##\.###$
RewriteRule .* - [F]

Or if you use a custom 403forbidden page, you'll need to allow them to request it, otherwise it will create a looping problem:

RewriteEngine On
RewriteCond %{REMOTE_ADDR} ^##\.###\.##\.##$ [OR]
RewriteCond %{REMOTE_ADDR} ^##\.###\.##\.##$ [OR]
RewriteCond %{REMOTE_ADDR} ^##\.1###\.##\.###$
RewriteRule !^403forbidden\.html$ - [F]

7:31 am on Jan 2, 2007 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



RewriteCond %{HTTP_USER_AGENT} ^Googlebot/2.1 [NC,OR]

pp46,
the line that you have:

UA begins with Googlebot should be a safe denial of a non-Google (fake) bot.

I went through a few of my logs and all the genuine Google UA's begin with another word.

You will however NEED to remove the OR from your line IF this is the only and/or last rewrite line that you are using.

Don

8:24 am on Jan 2, 2007 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Don, I agree with your above post. I guess what I should have said was that rewrite rule, as exemplified, would not work to block all spoofed UAs and that blocking IP addresses will.

In my experience, the spoofer just pastes the entire UA string exactly as Google uses it, so yes, the first word is not "Googlebot" and this is another reason why that example won't block effectively.

I currently block a half dozen of these.

11:19 am on Jan 2, 2007 (gmt 0)

5+ Year Member



Thanks for your answers
I had an other thread running on this here and have posted more info just now:
here [webmasterworld.com]
 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month