Welcome to WebmasterWorld Guest from 54.161.110.186

Forum Moderators: open

Message Too Old, No Replies

Entering site via a form

while spiders go through unchecked

     
10:00 am on Jul 19, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



You have seen cases where the home page of a site contains just a couple of buttons like english/spanish or "click here to enter" and so forth.

I just would like to confirm that such a method when used via an HTML form with post method for visitors alone is considered cloaking (while known spiders can index pages normally). Once the visitor clicks the first button he can browse the site normally. Spiders on the other hand do not go through that first step and browse the site without scrutiny.

I'm sure this will eliminate rogue bots or automated scripts. Although a bot can submit forms without a problem it will be unlikely to start debugging css styles so a form can be crafted in a way, to ensure a human is present and be simple at the same time. But of course if it results having a site removed from the spiders is of no use.

Did anyone ever tried it and if so what's their experience? TIA.

4:28 am on Jul 21, 2009 (gmt 0)

WebmasterWorld Senior Member vincevincevince is a WebmasterWorld Top Contributor of All Time 10+ Year Member



The moment someone posts a link (blog, forum, etc.) directly to an internal page, all the spiders will get in and because you have not put in place the usual spider protection your last state will be worse than your first.
9:54 am on Jul 29, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Vince, I am not sure I understand your point. A form for any new ip excluding known spiders is to ensure human presence. The difference from a normal site is that known spiders will get through as before while everyone else the first time will need to click a button.
5:49 pm on Jul 30, 2009 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



If the only page you have this BUTTON on is the home page, it really doesn't stop the spiders, or the people, from bypassing your home page unless you're tracking access via sessions and kick new sessions back to the home page.

If you do that however, you will tick off people coming to your site via SERPs to direct pages.

8:39 pm on Jul 30, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Bill, it won't apply to spiders. Thats only for whoever wants to enter the site as a human (basic UA check for http or spiders whitelist). But for humans it will apply to all pages the first time not just the home page.

Yes sessions are already used to post forms, create accounts etc, otherwise you only view pages.

But the thing is spiders will go through without scrutiny while humans the first time will have to click the button. I can set it up to use an ip or a cookie as signature for sometime so this won't be repeated for subsequent accesses.

This approach I don't know if it will cause spiders to detriment the value of the site or even remove it from their index.

10:02 pm on Jul 30, 2009 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



How will spiders know if it's working on humans only? They won't unless someone complains.

However, as you're describing it you'll be violating Google's 1st page free rule, meaning you should see the 1st indexed content page before being hit with a login or anything else.

After viewing one free page you can block them for login, but not before.

8:09 am on Aug 3, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Yes, that's what I am afraid of.

Another approach against scraping perhaps, will be to allow the humans to view a content page if and only if, it is first indexed by the popular spiders otherwise give them a 302. It should at least give the site admin the lead for the content.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month