homepage Welcome to WebmasterWorld Guest from 54.161.240.10
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Google's Site Verification Bot
A late warning
Angonasec




msg:4682721
 1:10 pm on Jun 25, 2014 (gmt 0)

Probably old news to you log-addicts.

My back was turned for a few months executing duties, and G had the nerve to switch its site verification bot IP to 66.249.80.232 without telling me. This resulted in my sites losing their status as verified, because I have long blocked the irritating IP block used for Google Plus snippets

# Google Plus snippets
deny from 66.249.80.0/20 66.249.84.226 66.249.81.145

This line was blocking verification.

So I deigned to allow the bot's IP thus...

# Allow Google Site Verification bot
allow from 66.249.80.232

That got my main site verified, but not two others. To achieve verification on those, I had to remove my deny of 66.249.80.0/20 then they were verified.

Loss of verification happened mid June 2014, but doesn't appear to have affected our ranking in G serp.

 

keyplyr




msg:4682817
 6:36 pm on Jun 25, 2014 (gmt 0)

Thanks for the heads-up. Made me check my own filters. I'm a bit more liberal in the ranges I let through for G so even with that recent change, I was OK... but ya never know :)

not2easy




msg:4682827
 6:57 pm on Jun 25, 2014 (gmt 0)

And I recently blocked it because of proxy referer spam traffic. That is listed as a Google Proxy server. Now I need to look closer.
IP Address 66.249.80.232
Host google-proxy-66-249-80-232.google.com

lucy24




msg:4682839
 7:39 pm on Jun 25, 2014 (gmt 0)

Ah. I knew they must have changed something, because site verification has been showing up in my logs recently, although it's supposed to be ignored. (I'm talking here about my personal log-wrangling routines, not the original raw logs.)

I had to remove my deny of 66.249.80.0/20 then they were verified.

That's where conditional RewriteRules come in.

RewriteCond {unwanted IP range}
RewriteCond {request-URI is not site-verification-thingy}
RewriteRule {blahblah ending in [F] }

not2easy




msg:4682845
 7:56 pm on Jun 25, 2014 (gmt 0)

You're right, lucy24 because just a quick check shows that the UA:
"Mozilla/5.0 (compatible; Google-Site-Verification/1.0)"
is coming from several IPs in that neighborhood: 66.249.90.185 and 66.249.90.74 recently.

lucy24




msg:4682852
 8:29 pm on Jun 25, 2014 (gmt 0)

:: detour to look up, which I should have done earlier ::

Ah. The "ignore" code has
^(72\.14|209\.8[45])\.\d+\.\d+
so until recently they simply didn't use 66.249.whatever at all. Is it bad when you can type someone's IP from memory? Now duly added.

I guess the time to worry is when random visitors start asking for the correct google verification file, because nobody else should even know its name. Considering the length of its name ("google" followed by 16 alphanumerics), it's pretty unlikely a robot would find it by blind luck.

:: detour to calculator ::

Well, it's got 24 zeros ;)

slipkid




msg:4682981
 6:32 am on Jun 26, 2014 (gmt 0)

Google proxy came through today using 66.249.81.153. Grabbed an image and left.

Looks like work to do.

wilderness




msg:4683004
 11:17 am on Jun 26, 2014 (gmt 0)

There's a fairly recent thread where somebody noted that the primary bot only utilizes thru the 79 Class C

RewriteCond %{REMOTE_ADDR} !^66\.249\.(6[4-9]|[7][0-9])\.

Angonasec




msg:4683009
 11:54 am on Jun 26, 2014 (gmt 0)

Blimey, and I thought +I'd+ been neglecting my logs, not having glanced at them for trois months.
Tchek!

slipkid




msg:4683061
 5:03 pm on Jun 26, 2014 (gmt 0)

So, what I have for a valid googlebot in my .htaccess file is,

RewriteCond %{REMOTE_ADDR} ^66\.249\.(6[4-9]|7[0-9]|8[0-46-9]|9[0-5])\.

from JDMorgan at [webmasterworld.com...]

remains true, but inside that range the google proxy is operating.

Should i rewrite this code to limit to the 79. class C as wilderness as suggested?

What is the google stuff that I would not be allowing besides the proxy?

slipkid




msg:4683112
 8:49 pm on Jun 26, 2014 (gmt 0)

Deleted message posted to wrong forum.

lucy24




msg:4683117
 9:21 pm on Jun 26, 2014 (gmt 0)

RewriteCond %{REMOTE_ADDR} ^66\.249\.(6[4-9]|7[0-9]|8[0-46-9]|9[0-5])\.

Does he explain why 66.249.85 is exempt? Yes, he probably does. But anything from JDMorgan will be several years old, so it's worth re-checking. If you didn't have that .85 loophole, the 70's and 80's could be reduced to

|[78]\d|

:: detour to logs ::

I don't think there's any difference at this point. At least not for the common entities like favicon or Preview.

But or most purposes,
66.249.64.0/20
i.e. 66\.249\.(6[4-9]|7\d)
should probably be handled separately from
66.249.80.0/20
i.e. 66\.249\.(8\d|9[0-5])
where the first is crawl, the second is assorted Googloid entities including Preview, Translate, favicon and so on. Thoughtful of them to use exactly this pair of /20 ranges, since it lets you split neatly at 7x|8x ;)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved