Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Googlebot using new IPs and no reverse DNS possible

         

Bigorno

12:41 pm on Dec 12, 2015 (gmt 0)

10+ Year Member



Hi everyone, I m sorry for my english

Yesterday I have noticed that Reverse dns is not working when Googlebot uses a never seen before IP Range.

Instead of returning the host name, gethostbyadrr($ip) is just returning the ip. So impossible to make reverse dns control and make sure that user is googlebot.

This issue happens at crawling when Googlebot uses this ip range => 162.158.168.XX (never seen these ip before... you ?)

What do you think about that ? Do Google must help us to identifiy Googlebot ?

For those that wants to figure out how i know that this is Google bot : just <php echo $_SERVER['REMOTE_ADRR']; ?> then wait for new Google cache version, or faster, use Search Engine Console and explore like Google and take a look to the screen shot.

Robert Charlton

8:30 am on Dec 14, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Yesterday I have noticed that Reverse dns is not working when Googlebot uses a never seen before IP Range.
You mention reverse DNS, but Google uses forward/reverse DNS. I'm not experienced with that to provide a detailed diagnosis, but you might be able to pick up on what's going on from several references I'm providing...

The IP range you cite is from Cloudflare, which is a CDN (content delivery network) based in the San Francisco area. They are very highly regarded.

CDN's have issues with load balancing and search engine crawlers that causes them occasionally to change an IP of a site having crawl issues, to maintain speed on the other sites they host. Here's some background which might shed some light on the potential issues for you.

First, see this article on verifying Googlebot, from Google Webmaster Central. I'm providing several quotes that might be relevant...

How to verify Googlebot
[googlewebmastercentral.blogspot.com...]

Telling webmasters to use DNS to verify on a case-by-case basis seems like the best way to go. I think the recommended technique would be to do a reverse DNS lookup, verify that the name is in the googlebot.com domain, and then do a corresponding forward DNS->IP lookup using that googlebot.com name....

....I don't think just doing a reverse DNS lookup is sufficient, because a spoofer could set up reverse DNS to point to crawl-a-b-c-d.googlebot.com

My emphasis added....
This answer has also been provided to our help-desk, so I'd consider it an official way to authenticate Googlebot. In order to fetch from the "official" Googlebot IP range, the bot has to respect robots.txt and our internal hostload conventions so that Google doesn't crawl you too hard.


And also see this from CloudFlare, which I'm assuming refers in part to the "internal hostload conventions" that Google mentions have been established...

CloudFlare and SEO
25 Jun 2011 by Matthew Prince.
[blog.cloudflare.com...]

With the cooperation of these search teams we were able to get CloudFlare's IP ranges are listed in a special category within search crawlers. Not only does this keep sites behind them from being clustered to a least performant denominator, or incorrectly geo-tagged based on the DNS resolution IP, it also allows the search engines to crawl at their maximum velocity since CloudFlare can handle the load without overburdening the origin.
Assuming someone hosted on CloudFlare isn't trying to spoof you, I'm thinking that perhaps the CloudFlare/Googlebot combination may have issues with either the reverse DNS (or the forward/reverse DNS) you might be using.

I would appreciate your thoughts from the above on what you think might be happening. Don't worry about your English... you've been doing pretty well, and your English is probably better than my IT.

dipper

9:40 am on Dec 14, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Google does help us identify googlebot - using reverse DNS. All legitimate googlebot IP's should respond like this ..

[support.google.com...]

if they do not respond like this, then they are most likely spammer bots posting useragent as googlebot.

bhukkel

9:51 am on Dec 14, 2015 (gmt 0)

10+ Year Member



If you are using a proxy like Cloudlfare your webserver and php only sees the IPs of the proxy. When you use apache you can install mod_cloudflare so apache and PHP sees the real IP of the visitor. After installation you can also use $_SERVER['REMOTE_ADRR'] again in PHP.

You can find mod_cloudflare here [cloudflare.com...]

Bigorno

4:32 pm on Dec 14, 2015 (gmt 0)

10+ Year Member



Hello everyone

thank you for your answers, and yes, I was wrong ! Googlebot is not hiding himself behind another IP, and the site I was talking about have cloudflare installed. Thank you bhukkel for the advice (mod_cloudflare). I saw that we can also use $_SERVER["HTTP_CF_CONNECTING_IP"]

Thank you Robert for all these informations