homepage Welcome to WebmasterWorld Guest from 107.20.131.154
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
fake google bot?
pp46




msg:3204489
 5:08 pm on Dec 31, 2006 (gmt 0)

I hope this is the right place to post this?

I have been swamped with error URL's (7844) I noticed this in my sitemaps account on webmaster central, These are all URL's which are not on my site, somehow googlebot is following wrong links.

Looking at my logs I see that they are all coming from Googlebot/2.1, having checked around and I gather that this is a phoney!

After looking around I see that I have to place an htaccess file with a line like this to block this bot :

RewriteCond %{HTTP_USER_AGENT} ^Googlebot/2.1 [NC,OR]
RewriteRule /*$ [site-you-are-sending-the-bot-to.com...] [L,R]

I dont really have an idea where I should send the bot to
http://www.site-you-are-sending-the-bot-to.com
can somebody point me in the right direction please.

 

wilderness




msg:3205114
 7:05 pm on Jan 1, 2007 (gmt 0)

can somebody point me in the right direction please.

Redirecting these to amother website (even the URL they came from) is a BAD practice.

The most effective solution to offer on your end is denial of access (403).

You have not provided an IP range or for these visits?
Nor have you provided a full UA?

The real Google could be chasing intentional errors to verify 404's, however the quanity you provided (7844) is far to many for a solitary website (unless your referring to a large time frame as opppsed to a day or a week or a month?

If these visits "are" coming from a FAKE Google the most effective practice is denial.

Don


keyplyr




msg:3205269
 11:19 pm on Jan 1, 2007 (gmt 0)

RewriteCond %{HTTP_USER_AGENT} ^Googlebot/2.1 [NC,OR]

Blocking this UA will also block the 'real' GoogleBot.

Better to block by IP address:

RewriteEngine On
RewriteCond %{REMOTE_ADDR} ^##\.###\.##\.##$ [OR]
RewriteCond %{REMOTE_ADDR} ^##\.###\.##\.##$ [OR]
RewriteCond %{REMOTE_ADDR} ^##\.1###\.##\.###$
RewriteRule .* - [F]

Or if you use a custom 403forbidden page, you'll need to allow them to request it, otherwise it will create a looping problem:

RewriteEngine On
RewriteCond %{REMOTE_ADDR} ^##\.###\.##\.##$ [OR]
RewriteCond %{REMOTE_ADDR} ^##\.###\.##\.##$ [OR]
RewriteCond %{REMOTE_ADDR} ^##\.1###\.##\.###$
RewriteRule !^403forbidden\.html$ - [F]

wilderness




msg:3205486
 7:31 am on Jan 2, 2007 (gmt 0)

RewriteCond %{HTTP_USER_AGENT} ^Googlebot/2.1 [NC,OR]

pp46,
the line that you have:

UA begins with Googlebot should be a safe denial of a non-Google (fake) bot.

I went through a few of my logs and all the genuine Google UA's begin with another word.

You will however NEED to remove the OR from your line IF this is the only and/or last rewrite line that you are using.

Don

keyplyr




msg:3205506
 8:24 am on Jan 2, 2007 (gmt 0)

Don, I agree with your above post. I guess what I should have said was that rewrite rule, as exemplified, would not work to block all spoofed UAs and that blocking IP addresses will.

In my experience, the spoofer just pastes the entire UA string exactly as Google uses it, so yes, the first word is not "Googlebot" and this is another reason why that example won't block effectively.

I currently block a half dozen of these.

pp46




msg:3205579
 11:19 am on Jan 2, 2007 (gmt 0)

Thanks for your answers
I had an other thread running on this here and have posted more info just now:
here [webmasterworld.com]

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved