homepage Welcome to WebmasterWorld Guest from 23.20.61.85
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
fake google bot?
pp46

5+ Year Member



 
Msg#: 3204487 posted 5:08 pm on Dec 31, 2006 (gmt 0)

I hope this is the right place to post this?

I have been swamped with error URL's (7844) I noticed this in my sitemaps account on webmaster central, These are all URL's which are not on my site, somehow googlebot is following wrong links.

Looking at my logs I see that they are all coming from Googlebot/2.1, having checked around and I gather that this is a phoney!

After looking around I see that I have to place an htaccess file with a line like this to block this bot :

RewriteCond %{HTTP_USER_AGENT} ^Googlebot/2.1 [NC,OR]
RewriteRule /*$ [site-you-are-sending-the-bot-to.com...] [L,R]

I dont really have an idea where I should send the bot to
http://www.site-you-are-sending-the-bot-to.com
can somebody point me in the right direction please.

 

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3204487 posted 7:05 pm on Jan 1, 2007 (gmt 0)

can somebody point me in the right direction please.

Redirecting these to amother website (even the URL they came from) is a BAD practice.

The most effective solution to offer on your end is denial of access (403).

You have not provided an IP range or for these visits?
Nor have you provided a full UA?

The real Google could be chasing intentional errors to verify 404's, however the quanity you provided (7844) is far to many for a solitary website (unless your referring to a large time frame as opppsed to a day or a week or a month?

If these visits "are" coming from a FAKE Google the most effective practice is denial.

Don


keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3204487 posted 11:19 pm on Jan 1, 2007 (gmt 0)

RewriteCond %{HTTP_USER_AGENT} ^Googlebot/2.1 [NC,OR]

Blocking this UA will also block the 'real' GoogleBot.

Better to block by IP address:

RewriteEngine On
RewriteCond %{REMOTE_ADDR} ^##\.###\.##\.##$ [OR]
RewriteCond %{REMOTE_ADDR} ^##\.###\.##\.##$ [OR]
RewriteCond %{REMOTE_ADDR} ^##\.1###\.##\.###$
RewriteRule .* - [F]

Or if you use a custom 403forbidden page, you'll need to allow them to request it, otherwise it will create a looping problem:

RewriteEngine On
RewriteCond %{REMOTE_ADDR} ^##\.###\.##\.##$ [OR]
RewriteCond %{REMOTE_ADDR} ^##\.###\.##\.##$ [OR]
RewriteCond %{REMOTE_ADDR} ^##\.1###\.##\.###$
RewriteRule !^403forbidden\.html$ - [F]

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3204487 posted 7:31 am on Jan 2, 2007 (gmt 0)

RewriteCond %{HTTP_USER_AGENT} ^Googlebot/2.1 [NC,OR]

pp46,
the line that you have:

UA begins with Googlebot should be a safe denial of a non-Google (fake) bot.

I went through a few of my logs and all the genuine Google UA's begin with another word.

You will however NEED to remove the OR from your line IF this is the only and/or last rewrite line that you are using.

Don

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3204487 posted 8:24 am on Jan 2, 2007 (gmt 0)

Don, I agree with your above post. I guess what I should have said was that rewrite rule, as exemplified, would not work to block all spoofed UAs and that blocking IP addresses will.

In my experience, the spoofer just pastes the entire UA string exactly as Google uses it, so yes, the first word is not "Googlebot" and this is another reason why that example won't block effectively.

I currently block a half dozen of these.

pp46

5+ Year Member



 
Msg#: 3204487 posted 11:19 am on Jan 2, 2007 (gmt 0)

Thanks for your answers
I had an other thread running on this here and have posted more info just now:
here [webmasterworld.com]

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved