Welcome to WebmasterWorld Guest from 18.104.22.168
'First click free' is a bit vague if you ask me. Is that the first click from the visitor, or the first page served after a Google referral? I imagine Google would prefer the latter. Tease-cloaking, I call it ;)
And of course, everyone will have their favourite "but example.com is cheating!" story. The reality is, these sites may be taking a risk and either happen to be undetected, or are significant enough to bend the rules a little. That's life! :)
Serve largely the same content per URL
She goes on to clarify that you can insert some dynamic portions into the page via IP detection, but you should "contain them or limit them to just small areas" of the total page. So there is no requirement from Google that the entire page will be exactly the same, no matter where the IP address for that request comes from.
A program such as md5sum or diff can compute a hash to verify that two different files are identical.
Of course, the two aren't mutually exclusive, although I think there is an implication there.
So Can I decide who my users are? Some sites have their user base in US only. Some "Users" in US that are specifically located(IP based) and coming from a known COLO/Hosting Ranges get the same exact treatment as a users that is coming from(IP based) and searching(referral) for given sites' contact info and are coming from West Africa. First Click Suggestion does not apply here at all, sorry. Is there a term that applies to White Hat Cloaking? Oh, I now, Site Security!
Cloaking: Serving different content to users than to Googlebot
Does that mean you're cloaking something different than the user sees?
Are sites that do this now at risk?
Also, I've seen those big domain parks target specific ads to their own properties just to Googlebot for ranking purposes opposed to the ads the domain parks were showing humans for making money.
Trying to prove they're cloaking those ads just to Google is the trick.
[edited by: incrediBILL at 10:50 pm (utc) on June 3, 2008]
Can't you use something like with the <noscript> tag instead of cloaking? I know it's an example but I'm sure most situations can be dealt with without checking someone's IP address or Browser user agent.
Can't you use something like with the <noscript> tag instead of cloaking?
Then you double up the size of your page load as my navigation is kind of large.
The point is it's Google trying to tell people how to run their websites. They don't own the web, we do, all our sites ARE the web, but they're trying to tell US how to do business online.
That simply rubs me the wrong way.
We have one customer who has both a .com.au and .com domain name.
They asked us about the following:
•301ing IPs originating in Australia from the .com to the .com.au site, and
•301ing IPs originating outside of Oz from the .com.au to the .com site
If all of Google’s DCs are out of Oz, none of the GoogleBots would have an Australian IP number based on our IP geo-targeting database.
So…. Assuming GB never has an Aussie IP, we would never have the .com.au site crawled.
If we don’t serve a 301 to a user agent of ‘bot’ then we could be mis-interpreted as cloaking….
Has anyone else ran into this scenario?
[edited by: tedster at 9:57 pm (utc) on June 10, 2008]
[edit reason] make link clickable [/edit]
they're trying to tell US how to do business online.
I do understand the feeling here, Bill, especially these days with Google such a dominant presence. But really, they're only telling us how to do business IF we want their free help. We are always free to build an online business that has nothing to do with Google. We just need a business model that can work.
Also, the i18n (internationalisation) has to cope with the fact that many bots (including G's when I last checked) don't supply an Accept-Language header. So in that case I try to default to a suitable language variant for where *my server* is, that would be good for geographically-local users.
Thus, when G's (US-based) bots visit my Sydney server they get a slight Aussie accented en-au language variant, an en-us when they visit my US server, en-gb for my UK servers, en-in for my Mumbai server, and zh when visiting the Beijing server. All servers are capable of producing all variants and are normally driven by i18n content negotiation.
But falling back to a language variant local to the *server* not the visitor/bot should, IMHO, help geotargetted/local Web SERPs.
So I don't cloak for any kind of deception: I trim page weight for survival, and I have a sensible fallback when i18n content negotiation is not possible.
The upshot is: if you find (say) a page from my AU mirror in (say) Google's AU local search, it will have a Strine accent and be quick to download and view.
But really, they're only telling us how to do business IF we want their free help.
Not really, the post was full of holes like swiss cheese open to interpretation, nothing concrete in DO THIS and DON'T DO THAT, make it less vague and I'll have less issue with it.
Besides, if there was an internet consortium of search engines making these rules and Google adopted them, instead of dictating them, I'd have a much easier time accepting it as a consensus of the web opposed to the big internet bully dictatorship.
I think they need to encourage webmasters to create the site for their users, but they want to alert us as to the sorts of things that can get us accidentally penalized because people really complain when that happens.
Mattieo's question is interesting. If Google does not have a datacentre in your country, then your country's pages will never be indexed. Maybe they do not have datacentres but they do use proxies in those countries (crawling via a normal user agent)?
A program such as md5sum or diff can compute a hash to verify that two different files are identical.
How ridiculous is that?
Just if you timestamp your pages ("Current date/time is (date) (time)"), every page gives another different md5 hash every second.
MD5 and DIFF is *absolutely* worthless and would give tons of false positives if used to find cloaking.
In theory, there is no difference between theory and practice. But in practice, there is.
What are they thinking over there?
Well, that reminds me of a dialogue of Mandrell and Dr. Who ... but I disgress.
[edited by: Romeo at 12:13 pm (utc) on June 4, 2008]
OK, time to test if Google really means it.
Got the heads up for this just now. Google should know about them for they're one of the challengers of YouTube.
Try accessing their service from any of the blocked countries to get what I mean.
158.Vatican City <*hahaaa* ...WHY?>
165.South Africa <...end of list.>
Trying to access the www subdomain from an IP that's associated w/ the above locations, the site falls short of denying service by showing a single, plain white page:
Example is no longer available in YOURCOUNTRY.
If you are not in YOURCOUNTRY or you think you have received this message in error, please report the issue below.
(Please enter your email address)
How's THAT for different content to users than to Googlebot? Seeing the recent news [news.yahoo.com] about them one has to wonder if this is some kind of a business model (?) but anyway...
they're only telling us how to do business IF we want their free help. We are always free to build an online business that has nothing to do with Google. We just need a business model that can work.
I wonder if a site that's unaccessible from half of the planet deserves their help. I'll be standing by, watching whether Google enforces their policy for good.
Honestly, after reading into the forum posts about what's happening, I feel like joining the revolution
[edited by: engine at 5:02 pm (utc) on June 4, 2008]
[edit reason] No specific sites, thanks [/edit]
It seems to me that search engines with concerns about how the content gets delivered are not being unreasonable. Clearly they would want the page to look just like how users will see things for their own credibility reasons.
Search engine operators: I think that what you want is a fine goal, so start acting like a variety browsers and devices that users work with. Send out language and country data in headers to test such behavior, accept cookies and the like so that your crawler will see what users see.
The world of site development was once fairly challenging with just having to keep up with support for multiple browsers (who even today interpret standards in different ways). It is getting far more complex with so many different device display formats which also have to be considered in good site design these days to maximise the user experience.
While I appreciate at some level the junk with which search engines must have to condend, it is also important for them to stay up to date on reasonable approaches to site design. It is wrong to immediately conclude such features added to improve the site user (and usability) experience are "really" being done to "game the engine".
That's a lot of what these recent communications are about. If we want the help that Google can give our site, then we also need to understand what they wrestle with, what their current focus and limitations are, and so on.
It's all a work in progress, both on the developers' side and on Google's. The more they can tell us about the state of their art, the better things can be.
what about services like gravity stream, that serves static html pages to googlebot for large dynamic ecommerce site?
Heck, Amazon does conditional IP delivery. They're fine... for now. ;)
Regional versions of Google ( e.g. google.com.mx, google.co.in etc. ) still list content that users are not allowed to see. The blocking seems IP based. None the less, the number of pages listed in regional indexes have increased to some 6 million. That's 2 million+ since the content became completely unavailable.
Clicking the Google results will bring up nothing that the SERPs promise.
enforcement of Google policies in this case ?
Interestingly enough, I am working with one client who has both a .com and a .com.au and they do some IP based redirecting between the two. And their indexing is a total mess.