Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Geo-distributed crawling and Fetch as Googlebot

         

onlinesource

5:00 am on Jan 9, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



Is anybody familiar with Geo-distributed crawling? I read that Googlebot ip address only comes from the USA. At least that was the case a few years ago when Matt Cutts said, "Google does not, right now, have any crawling that happens from non-US IP addresses.". This article [support.google.com...] confirms that the Googlebot defaults to the USA.

I currently operate several domains including mysite.com, mysite.ca, mysite.co.uk and mysite.in. I am using GEO IP Redirect modules to push international traffic to their appropriate stores. Canada traffic goes to .ca, India to .in, UK to co.uk and everything else, USA includes, goes to .com. If somebody from Las Vegas, USA tries to go to my .ca site they are 301 redirected to my .com site. Works as far as I can tell.

My problem is, when I got into Google Webmaster Tools and ask Googlebot to fetch my .ca site it says it can't because it's Redirected. This leads me to believe that the Googlebot crawling the site is in USA, which is why it's pushed away. BUT why? Why would Google not use a Googlebot in Canada, for instance?

Even if all Googlebots associated with the Fetch as Google feature are USA based only, how would my .ca site ever get fetched? What am I missing? Sorry if this is a dumb question. I'm assuming that since the Googlebots default to the USA, I can't force it to crawl a non-USA site that is being redirected. I just have to wait for a Googlebot in that country to make it's way to my site?

Robert Charlton

8:47 am on Jan 9, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



onlinesource, forgive a hasty but possibly long reply. My initial take on what you're asking... as I'm understanding your question and the documentation you link to... is that "geo-distributed crawling" shouldn't apply to you at all.

Here's the help article you link to....

Locale-aware crawling by Googlebot
[support.google.com...]

Here's the first paragraph of text in the article (my emphasis added)...
This article describes how Google uses different crawl settings for sites that cannot have separate URLs for each locale.

In the situation you describe, you apparently have separate domains, each with its own ccTLD, for each geo-location, so the single-URL condition for "Locale-aware crawling" isn't the case here.

Google divides "Locale-aware" into two basic situations... "Geo-distributed crawling" and "Language-dependent crawling". These apply when Google spots certain "signals and hints", as the article describes them... specifically when it sees...

- "different content on the same URL - based on the user's perceived country (geolocation)"
- or when you are "serving different content on the same URL - based on the Accept-Language field set by the user's browser in the HTTP request header"
...etc

Regarding geo-location, the article IMO is slightly unclear because of the tenses used... ie...
Googlebot uses well-established IP addresses that appear to come from the United States. With geo-distributed crawling, Googlebot can now use IP addresses that appear to come from other countries, such as Australia.

But the Googlebot situation, as I understand it, has been changing, and it might be clearer if Google said something like...
Googlebot has up until now used well-established IP addresses that appear to come from the United States. With geo-distributed crawling, Googlebot can now use IP addresses that appear to come from other countries, such as Australia.
There may be some technicality why Google doesn't say that, but that's the way I understand the situation.

Regarding your setup of 301 redirecting to different ccTLD sites using some sort of Geo IP module... I myself would never use IP range to set user important user preferences. Opinions on this vary, and I'll leave it for others to discuss. We have some recent discussions on the topic, but it's too late at night to hunt for them.

IP redirecting to a different url is arguably a type of cloaking. I was initially sort of surprised that the redirects hadn't gotten you into some trouble, but as I think about it, chances are that your separate ccTLD sites are ranking on their own, and that Google is allowing the 301 redirects because Google is seeing what the user is seeing. The ccTLDs, though, should essentially be allowing the foreign sites to be ranking in their own territory... and the 301s shouldn't be necessary.

Your description sounds as though, without the redirects, you'd probably be very close to the setup that Google recommends...
IMPORTANT: We continue to support and recommend using separate locale URL configurations and annotating them with rel=alternate hreflang annotations.
I suggest simply dropping the Geo IP redirect module, and using flags or text links to let the users manually choose their language and geo preferences, if they should need to do that. Without the 301s, btw, I'm guessing that you could Fetch as Googlebot without a problem.

I should add, btw, that the above are rushed thoughts, and you should check out the linking situations and hreflang configurations of each of your ccTLD sites before making any changes.

not2easy

3:24 pm on Jan 9, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I agree with Robert as to getting rid of the geo-IP redirect as you can be losing business or traffic sending visitors where they didn't want to go. I've visited sites that use this kind of redirect and their assumptions seem arrogant to me as a user because I may not be where I live while trying to view their site, or I may be trying to assist a friend who lives in the area of that business. If I can't see the site I can't send them the link so I go to another business and send them their link instead. Let your visitors choose which version/site they want to visit. ;)