Welcome to WebmasterWorld Guest from 126.96.36.199
Forum Moderators: open
Here are 3 examples of cloaking, all of which I have read on various places, that some sites seem to get away, while other sites get banned.
1) You have content that you don’t want screen scraped/data mined, you may want to prevent this by redirecting the user (which could be a bot) to a challenge image or some other way of proving that they are not a bot.
2) You have some content that a user needs to register for free, so the user is redirected to a free registration page before seeing the content.
3) You have some content that you want someone to pay for (like New York Times) so you let Googlebot index the content but when a user clicks on a search result, they have to pay to see the content.
I know there are other reasons for cloaking, but the 3 above in the “gray area” to me.
Everything I have read from the experts always has “in my opinion, this is (or is not) cloaking”. I can’t risk my company on anything but black and white facts.
Just look at the number of threads on webmasterworld where people are trying to figure out if they are cloaking or not!
It would be helpful if search engines provide the webmaster community with a clear set of rules.
Specifics you mention:
1- there is little you can do. We are killed daily [ojr.org] here by scrappers bots.
a) write your domain name to every page generated.
b) write the ip and domain name to the page if you can. (this has saved our bacon hundreds of times).
c) write the ip and domain name to a html comment if the page is auto generated.
d) always include some sort of ad on the page.
e) require the 2nd or third click from the same ip to support cookies. This blocks 98% of the bots.
> 2) You have some content that a user needs to register for free,
There is nothing you can do here, require them to login.
> 3) You have some content that you want someone to pay for (like New York Times)
Google won't allow it, unless you have a back room agreement like the NY Times.
You can how ever, certainly require high abuse ips to login or support cookies. Just make sure googlebot.com isn't considered a high abuse isp....lol
> It would be helpful if search engines provide the
> webmaster community with a clear set of rules.
They do to a degree, but the space and items are fluid.
Essentially the engines are most concerned about intent and user experience. Are you cloaking for the purpose of attaining rankings? Are you cloaking for the purpose of tricking visitors (eg: content switching), or are you cloaking to protect the integrity of your system?
Ultimately, you have to do what is best for your visitors first and think about all the other services like the engines second. Be forewarned though - that attitude brings it's own risks. The engines tend to not like webmasters that think for themselves first and do not put the engines first. My vote is to always do what is best for your visitors first.
Personally, I believe the safest method of New York Times style cloaking would be Geolocation based cloaking, where you simply give all IPs in Googlebot (and other major SE spider) ranges (geographically, I mean) access to premium content and make everybody else have a cookie that is only given upon registration/captcha/whatever.
The fact that the webmasterworld.com has to ban all bots tells me that I probably shouldn’t expose my content (which is my entire business) and expect only good bots to come my way!
There are probably just as many legitimate reasons for cloaking, as there are shady reasons…
Perhaps if SEOs and Webmasters got together and asked for a clear set of rules for cloaking, we may get it.
I think something like a Meta Tag indicating what type of content the bot is indexing would solve the problem.
Meta Content = free/register/pay/subscription/etc
Then the search engines can give an option to their users to view only free or free + “non free” content.
A bot crawling this content would run across the Meta Tags and deal with it accordingly.
Think of the amount of (previously invisible) websites that have been added in the past 2 years with Google and Yahoo trying to out do each other. Each trying to claim they have more content.
My site and many others have content that is our intellectual property. I have to get some legal agreement from the user to not steal this content. This puts me behind a login “firewall” and keeps bots out…
Yes, indexing pay-only content is a waste of the user's time. More importantly, it's a waste of MY time.
i'd like to know something is available and then decide if it is worth getting.
sometimes free content is worth exactly what you pay for it!
maybe my search today is for the free 30 second snippet of that song i was trying to remember but the same search tomorrow is so i can buy and download the cd.
is it a waste of time to know that your neighborhood book store has a book for sale on a subject that you are interested in?
perhaps there is a magazine in a rack down the street that has a review of a consumer product you were thinking of buying.
or maybe you would like to know that something is available in the local library even if you must obtain a library card before you can borrow it.
i don't really see the essential difference here...
Do you mean within 24-72 hours of a robots.txt ban on googlebot this website (webmasterworld.com) was dropped from the Google system? If this website was dropped from the Google system how did it impact this website? Do most referrals come from google? If so why would the googlebot be banned?
Do you mean within 24-72 hours of a robots.txt ban on googlebot this website (webmasterworld.com) was dropped from the Google system?
Yes. Poof! Gone!
If this website was dropped from the Google system how did it impact this website?
I think it gave the admin team a chance to take a quick break and get everything back in order. ;)
From hundreds of thousands of pages to zero in less than 72 hours. We'll never know the real numbers but you can expect them to be staggering. ;)
This is precisely what I want to do, give the customer an idea of what they are getting for free, let them keep the free content but give them the option of upgrade to the paid content. The only catch is that they have to agree to “terms and conditions” to not steel my free content. Cloaking is currently the only way of achieving this.
I am sure there is a lot of content like this out there in the invisible web and it is in the best interest of all parties (search engines, consumers, companies) to start indexing this content.
A simple way could be a meta tag that I can put on my content indicating what type of content it is - Meta Content = free/register/pay/subscription/etc.
This is what I am trying to avoid: [webmasterworld.com...]
[edited by: encyclo at 1:20 am (utc) on Mar. 6, 2007]
[edit reason] fixed link [/edit]
But hey it could be the other way, me being so old and all.
Yup somewhere in this thread: [webmasterworld.com...]
[edited by: theBear at 2:25 am (utc) on Mar. 6, 2007]
Sort of like the no-follow tag, but for users.
If someone is really interested in that post then, there is a disclaimer which search engines can display to users.
It is a win-win situation.
The search engine has the indexed content.
The user knows the result he is facing (whether paid or not).