homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

Is this googlebot or not?
Seems this in GBot's Range but doesn't Validate

 5:25 pm on Apr 2, 2014 (gmt 0)

While culling Googlebot records from my raw log table, I'm encountering some questionable results related to the /24 range. Here are a couple of examples. - Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) - Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

There are other IP's involved as well (all in the above range).

Anyway, these don't validate using the standard method (failing on the forward lookup). Here are the results.

Reverse DNS for = crawl-66-249-70-238.googlebot.com

Domain checks out.

Forward DNS throws a socket exception: No such host is known.

The reason I'm asking about this is because I thought this was in G's range and it seems like this should validate. As I said before, I'm seeing this with everything across /24 that claims to be Googlebot.



 6:47 pm on Apr 2, 2014 (gmt 0)

Googlebot has been using a wider variation of user agents to get content cloaked to mobile devices.

As long as it says Googlebot somewhere and the round trip DNS validation works, it's the real deal.

However, in this case it sounds like someone at Google make an error in the DNS for crawl-66-249-70-238.googlebot.com and it might be worthwhile sending someone there an email to let them know.


 7:11 pm on Apr 2, 2014 (gmt 0)

I had to modify my acceptance algorithm for stupid googlebot for this reason. I go so many "bad UA" responses in my log it was ridiculous!


 7:34 pm on Apr 2, 2014 (gmt 0)

I had a feeling this was a DNS error on G's part. Makes me wonder how many unwarranted 404's, 403's they're eating these days because of this. Right now I'm just cleaning up log files but I can imagine what a mess this would create in a blocking algo (if you actually care about having your stuff indexed).

Contact Google? Not even sure where to start on that one where this matter is concerned.


 10:22 pm on Apr 2, 2014 (gmt 0)

But isn't it worth it for the visceral satisfaction of telling a Major Internet Company that they goofed?

Just hope they never start calling themselves "GoogleBot", as that's an attested spoofer.


 10:59 pm on Apr 2, 2014 (gmt 0)

Telling them they goofed does seem more satisfying "viscerally" than telling them they goofed up my logs.

Seems like almost everything this afternoon is coming from that .238 address which is filling up my Fake Googlebot log quite rapidly. Funny thing is, most of this is coming from IP's that end in 8 with hits from the following over the past few days.

With one odd ball for good measure.


 11:12 pm on Apr 2, 2014 (gmt 0)

I have written a script which checks reverse/forward dns. When I do forward dns it returns me the host name instead of ip address. Here is my script:

$ip = '';
$host = gethostbyaddr($ip);

//check if host exists
if($host != $ip) {
$real_ip = @gethostbyname($host);
if($real_ip == $ip) {
echo 'It's Google';
else {
echo 'Forward dns lookup failed';


 8:58 am on Apr 3, 2014 (gmt 0)

$real_ip = @gethostbyname($host.'.'); // -- ;) -- //


 2:49 pm on Apr 3, 2014 (gmt 0)

In thinking a bit more about the title of this thread (and the caption it's posted under on the home page of WebmasterWorld) some other ideas came to mind...

"Googlebot Now Takes Steps to Block Itself"
"Googlebot: the First Self-blocking Robot"
"Google Launches Google404.com -- Revolutionizes Internet Search"

Regarding the last one, just type in "404". 2.3 Billion results returned in .083 seconds. The real trick is going to be how to get a good ranking in this new engine. I'm stuffing my 404 page with keywords as we speak. Also, will be adding some structured data and a large, high quality image suitable for scraping.


 9:57 pm on Apr 3, 2014 (gmt 0)

Barry Schwartz ran with our story and got the attention of John Mueller from Google that said "Oops, we'll get that set up before we continue using those IP ranges!"

See the great technical reply on Google+


 2:40 pm on Apr 4, 2014 (gmt 0)

Still getting hit as of 10 minutes ago. They're just making the coffee in Mountain View I'm thinking.

Dim coffeeGrounds as Integer = numberOfPeopleAtGoogle x 20000



 6:53 pm on Apr 4, 2014 (gmt 0)

I've only seen one hit since about 10am Eastern or so (a little after noon from a smart phone). Hard to say if that means it's really stopped because they've come in batches in the past. Forward lookup still failing on at least the one IP I actually checked today (


 11:02 am on Apr 5, 2014 (gmt 0)

Barry Schwartz ran with our story and got the attention of John Mueller from Google that said "Oops, we'll get that set up before we continue using those IP ranges!"

Yes they'll fix it, right after they get their list of potential cloakers...


 2:36 pm on Apr 7, 2014 (gmt 0)

Well the hits have stopped coming for now from those IPs. Call it a fix if you like. Moving on... ;)

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved