homepage Welcome to WebmasterWorld Guest from 54.197.183.230
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Is this googlebot or not?
Seems this in GBot's Range but doesn't Validate
webcentric

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4659558 posted 5:25 pm on Apr 2, 2014 (gmt 0)

While culling Googlebot records from my raw log table, I'm encountering some questionable results related to the 66.249.70.0 /24 range. Here are a couple of examples.

66.249.70.238 - Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

66.249.70.18 - Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

There are other IP's involved as well (all in the above range).

Anyway, these don't validate using the standard method (failing on the forward lookup). Here are the results.

Reverse DNS for 66.249.70.238 = crawl-66-249-70-238.googlebot.com

Domain checks out.

Forward DNS throws a socket exception: No such host is known.

The reason I'm asking about this is because I thought this was in G's range and it seems like this should validate. As I said before, I'm seeing this with everything across 66.249.70.0 /24 that claims to be Googlebot.

 

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4659558 posted 6:47 pm on Apr 2, 2014 (gmt 0)

Googlebot has been using a wider variation of user agents to get content cloaked to mobile devices.

As long as it says Googlebot somewhere and the round trip DNS validation works, it's the real deal.

However, in this case it sounds like someone at Google make an error in the DNS for crawl-66-249-70-238.googlebot.com and it might be worthwhile sending someone there an email to let them know.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4659558 posted 7:11 pm on Apr 2, 2014 (gmt 0)

I had to modify my acceptance algorithm for stupid googlebot for this reason. I go so many "bad UA" responses in my log it was ridiculous!

webcentric

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4659558 posted 7:34 pm on Apr 2, 2014 (gmt 0)

I had a feeling this was a DNS error on G's part. Makes me wonder how many unwarranted 404's, 403's they're eating these days because of this. Right now I'm just cleaning up log files but I can imagine what a mess this would create in a blocking algo (if you actually care about having your stuff indexed).

Contact Google? Not even sure where to start on that one where this matter is concerned.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4659558 posted 10:22 pm on Apr 2, 2014 (gmt 0)

But isn't it worth it for the visceral satisfaction of telling a Major Internet Company that they goofed?

Just hope they never start calling themselves "GoogleBot", as that's an attested spoofer.

webcentric

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4659558 posted 10:59 pm on Apr 2, 2014 (gmt 0)

Telling them they goofed does seem more satisfying "viscerally" than telling them they goofed up my logs.

Seems like almost everything this afternoon is coming from that .238 address which is filling up my Fake Googlebot log quite rapidly. Funny thing is, most of this is coming from IP's that end in 8 with hits from the following over the past few days.

66.249.70.18
66.249.70.28
66.249.70.138
66.249.70.148
66.249.70.158
66.249.70.168
66.249.70.228
66.249.70.238

With one odd ball for good measure.
66.249.70.72

jojy

5+ Year Member



 
Msg#: 4659558 posted 11:12 pm on Apr 2, 2014 (gmt 0)

I have written a script which checks reverse/forward dns. When I do forward dns it returns me the host name instead of ip address. Here is my script:


$ip = '66.249.70.122';
$host = gethostbyaddr($ip);

//check if host exists
if($host != $ip) {
$real_ip = @gethostbyname($host);
if($real_ip == $ip) {
echo 'It's Google';
else {
echo 'Forward dns lookup failed';
}
}

JD_Toims

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4659558 posted 8:58 am on Apr 3, 2014 (gmt 0)

$real_ip = @gethostbyname($host.'.'); // -- ;) -- //

webcentric

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4659558 posted 2:49 pm on Apr 3, 2014 (gmt 0)

In thinking a bit more about the title of this thread (and the caption it's posted under on the home page of WebmasterWorld) some other ideas came to mind...

"Googlebot Now Takes Steps to Block Itself"
"Googlebot: the First Self-blocking Robot"
"Google Launches Google404.com -- Revolutionizes Internet Search"

Regarding the last one, just type in "404". 2.3 Billion results returned in .083 seconds. The real trick is going to be how to get a good ranking in this new engine. I'm stuffing my 404 page with keywords as we speak. Also, will be adding some structured data and a large, high quality image suitable for scraping.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4659558 posted 9:57 pm on Apr 3, 2014 (gmt 0)

Barry Schwartz ran with our story and got the attention of John Mueller from Google that said "Oops, we'll get that set up before we continue using those IP ranges!"

See the great technical reply on Google+
https://plus.google.com/u/0/+BarrySchwartz/posts/8JZ5azQfCvk

webcentric

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4659558 posted 2:40 pm on Apr 4, 2014 (gmt 0)

Still getting hit as of 10 minutes ago. They're just making the coffee in Mountain View I'm thinking.

Dim coffeeGrounds as Integer = numberOfPeopleAtGoogle x 20000

etc.

webcentric

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4659558 posted 6:53 pm on Apr 4, 2014 (gmt 0)

I've only seen one hit since about 10am Eastern or so (a little after noon from a smart phone). Hard to say if that means it's really stopped because they've come in batches in the past. Forward lookup still failing on at least the one IP I actually checked today (66.249.70.148).

bumpski

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4659558 posted 11:02 am on Apr 5, 2014 (gmt 0)

Barry Schwartz ran with our story and got the attention of John Mueller from Google that said "Oops, we'll get that set up before we continue using those IP ranges!"

Yes they'll fix it, right after they get their list of potential cloakers...

webcentric

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4659558 posted 2:36 pm on Apr 7, 2014 (gmt 0)

Well the hits have stopped coming for now from those IPs. Call it a fix if you like. Moving on... ;)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved