Msg#: 3955257 posted 10:00 am on Jul 19, 2009 (gmt 0)
You have seen cases where the home page of a site contains just a couple of buttons like english/spanish or "click here to enter" and so forth.
I just would like to confirm that such a method when used via an HTML form with post method for visitors alone is considered cloaking (while known spiders can index pages normally). Once the visitor clicks the first button he can browse the site normally. Spiders on the other hand do not go through that first step and browse the site without scrutiny.
I'm sure this will eliminate rogue bots or automated scripts. Although a bot can submit forms without a problem it will be unlikely to start debugging css styles so a form can be crafted in a way, to ensure a human is present and be simple at the same time. But of course if it results having a site removed from the spiders is of no use.
Did anyone ever tried it and if so what's their experience? TIA.
Msg#: 3955257 posted 4:28 am on Jul 21, 2009 (gmt 0)
The moment someone posts a link (blog, forum, etc.) directly to an internal page, all the spiders will get in and because you have not put in place the usual spider protection your last state will be worse than your first.
Msg#: 3955257 posted 9:54 am on Jul 29, 2009 (gmt 0)
Vince, I am not sure I understand your point. A form for any new ip excluding known spiders is to ensure human presence. The difference from a normal site is that known spiders will get through as before while everyone else the first time will need to click a button.
If the only page you have this BUTTON on is the home page, it really doesn't stop the spiders, or the people, from bypassing your home page unless you're tracking access via sessions and kick new sessions back to the home page.
If you do that however, you will tick off people coming to your site via SERPs to direct pages.
Bill, it won't apply to spiders. Thats only for whoever wants to enter the site as a human (basic UA check for http or spiders whitelist). But for humans it will apply to all pages the first time not just the home page.
Yes sessions are already used to post forms, create accounts etc, otherwise you only view pages.
But the thing is spiders will go through without scrutiny while humans the first time will have to click the button. I can set it up to use an ip or a cookie as signature for sometime so this won't be repeated for subsequent accesses.
This approach I don't know if it will cause spiders to detriment the value of the site or even remove it from their index.
Msg#: 3955257 posted 8:09 am on Aug 3, 2009 (gmt 0)
Yes, that's what I am afraid of.
Another approach against scraping perhaps, will be to allow the humans to view a content page if and only if, it is first indexed by the popular spiders otherwise give them a 302. It should at least give the site admin the lead for the content.