See this other current thread: [webmasterworld.com...]
And an older related fun: [webmasterworld.com...]
some similar changes WebmasterWorld made late last year: [webmasterworld.com...]
Specifics you mention:
1- there is little you can do. We are killed daily [ojr.org] here by scrappers bots.
a) write your domain name to every page generated.
b) write the ip and domain name to the page if you can. (this has saved our bacon hundreds of times).
c) write the ip and domain name to a html comment if the page is auto generated.
d) always include some sort of ad on the page.
e) require the 2nd or third click from the same ip to support cookies. This blocks 98% of the bots.
> 2) You have some content that a user needs to register for free,
There is nothing you can do here, require them to login.
> 3) You have some content that you want someone to pay for (like New York Times)
Google won't allow it, unless you have a back room agreement like the NY Times.
You can how ever, certainly require high abuse ips to login or support cookies. Just make sure googlebot.com isn't considered a high abuse isp....lol
> It would be helpful if search engines provide the
> webmaster community with a clear set of rules.
They do to a degree, but the space and items are fluid.
Essentially the engines are most concerned about intent and user experience. Are you cloaking for the purpose of attaining rankings? Are you cloaking for the purpose of tricking visitors (eg: content switching), or are you cloaking to protect the integrity of your system?
Ultimately, you have to do what is best for your visitors first and think about all the other services like the engines second. Be forewarned though - that attitude brings it's own risks. The engines tend to not like webmasters that think for themselves first and do not put the engines first. My vote is to always do what is best for your visitors first.
We'll never get definitive answers on the questions of cloaking from search engine representatives. The topic of cloaking is so completely taboo that they purposefully make obtuse and misleading statements to make sure there is no clear cut policy. This leaves them plenty of wiggle room to just do what they please on a case-by-case basis.
Personally, I believe the safest method of New York Times style cloaking would be Geolocation based cloaking, where you simply give all IPs in Googlebot (and other major SE spider) ranges (geographically, I mean) access to premium content and make everybody else have a cookie that is only given upon registration/captcha/whatever.
[ojr.org...] “WebmasterWorld.com has taken the radical step of banning all spiders”
The fact that the webmasterworld.com has to ban all bots tells me that I probably shouldn’t expose my content (which is my entire business) and expect only good bots to come my way!
There are probably just as many legitimate reasons for cloaking, as there are shady reasons…
Perhaps if SEOs and Webmasters got together and asked for a clear set of rules for cloaking, we may get it.
In my case, I have intellectual property that I would need someone to login/register to view; some pay per view, some free.
I think something like a Meta Tag indicating what type of content the bot is indexing would solve the problem.
Meta Content = free/register/pay/subscription/etc
Then the search engines can give an option to their users to view only free or free + “non free” content.
A bot crawling this content would run across the Meta Tags and deal with it accordingly.
At the present time most search engines don't feel it's in their interest to use their SERPs to give you free advertising. If you have pay-per-view content you want to promote, you can always use AdWords.
Then all of my content, and other sites with tons of content, will remain in the invisible web.
I would think that the more (good) content a search engine has, the better it is for them. Not to mention, all the advertising money they will be making off this content while people are searching…
> At the present time most search engines don't feel it's
> in their interest to use their SERPs to give you free advertising.
All serps are advertising. All listings are advertising. *all*
keep reading, we let them back in.
True, in a manner of speaking. I guess the point is that listings which enhance the user experience are beneficial to the SE. Listings which waste the user's time are harmful to the SE's brand.
The invisible web is not full of listings which waste the users’ time. Google, Yahoo and others are constantly trying to figure out how to index this content because it is useful to the users, not to mention adding to their ad revenue.
Think of the amount of (previously invisible) websites that have been added in the past 2 years with Google and Yahoo trying to out do each other. Each trying to claim they have more content.
My site and many others have content that is our intellectual property. I have to get some legal agreement from the user to not steal this content. This puts me behind a login “firewall” and keeps bots out…
Yes, indexing pay-only content is a waste of the user's time. More importantly, it's a waste of MY time.
|Yes, indexing pay-only content is a waste of the user's time. More importantly, it's a waste of MY time. |
i'd like to know something is available and then decide if it is worth getting.
sometimes free content is worth exactly what you pay for it!
maybe my search today is for the free 30 second snippet of that song i was trying to remember but the same search tomorrow is so i can buy and download the cd.
is it a waste of time to know that your neighborhood book store has a book for sale on a subject that you are interested in?
perhaps there is a magazine in a rack down the street that has a review of a consumer product you were thinking of buying.
or maybe you would like to know that something is available in the local library even if you must obtain a library card before you can borrow it.
i don't really see the essential difference here...
|Banned - keep reading, we let them back in. |
lol! I remember that first 24-72 hours when you banned the bots. Google wiped us out and left us with our jaws hanging, huh? ;)
"Google wiped us out and left us with our jaws hanging, huh? ;)"
Do you mean within 24-72 hours of a robots.txt ban on googlebot this website (webmasterworld.com) was dropped from the Google system? If this website was dropped from the Google system how did it impact this website? Do most referrals come from google? If so why would the googlebot be banned?
|Do you mean within 24-72 hours of a robots.txt ban on googlebot this website (webmasterworld.com) was dropped from the Google system? |
Yes. Poof! Gone!
|If this website was dropped from the Google system how did it impact this website? |
I think it gave the admin team a chance to take a quick break and get everything back in order. ;)
From hundreds of thousands of pages to zero in less than 72 hours. We'll never know the real numbers but you can expect them to be staggering. ;)
phranque: "i'd like to know something is available and then decide if it is worth getting.
sometimes free content is worth exactly what you pay for it! "
This is precisely what I want to do, give the customer an idea of what they are getting for free, let them keep the free content but give them the option of upgrade to the paid content. The only catch is that they have to agree to “terms and conditions” to not steel my free content. Cloaking is currently the only way of achieving this.
I am sure there is a lot of content like this out there in the invisible web and it is in the best interest of all parties (search engines, consumers, companies) to start indexing this content.
A simple way could be a meta tag that I can put on my content indicating what type of content it is - Meta Content = free/register/pay/subscription/etc.
This is what I am trying to avoid: [webmasterworld.com...]
[edited by: encyclo at 1:20 am (utc) on Mar. 6, 2007]
[edit reason] fixed link [/edit]
Now I am very,very long o tooth but I seem to remember that someone aimed the url console at WebmasterWorld and pressed the please, pretty please, remove this from the index.
But hey it could be the other way, me being so old and all.
Yup somewhere in this thread: [webmasterworld.com...]
[edited by: theBear at 2:25 am (utc) on Mar. 6, 2007]
I think CMS should enable users to "label" the post as subscription.
Sort of like the no-follow tag, but for users.
If someone is really interested in that post then, there is a disclaimer which search engines can display to users.
It is a win-win situation.
The search engine has the indexed content.
The user knows the result he is facing (whether paid or not).
kaizenlog -- good idea.
<meta name="subscription" content="true">