homepage Welcome to WebmasterWorld Guest from 54.226.18.74
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Marketing and Biz Dev / Cloaking
Forum Library, Charter, Moderator: open

Cloaking Forum

    
Entering site via a form
while spiders go through unchecked
enigma1




msg:3955259
 10:00 am on Jul 19, 2009 (gmt 0)

You have seen cases where the home page of a site contains just a couple of buttons like english/spanish or "click here to enter" and so forth.

I just would like to confirm that such a method when used via an HTML form with post method for visitors alone is considered cloaking (while known spiders can index pages normally). Once the visitor clicks the first button he can browse the site normally. Spiders on the other hand do not go through that first step and browse the site without scrutiny.

I'm sure this will eliminate rogue bots or automated scripts. Although a bot can submit forms without a problem it will be unlikely to start debugging css styles so a form can be crafted in a way, to ensure a human is present and be simple at the same time. But of course if it results having a site removed from the spiders is of no use.

Did anyone ever tried it and if so what's their experience? TIA.

 

vincevincevince




msg:3956268
 4:28 am on Jul 21, 2009 (gmt 0)

The moment someone posts a link (blog, forum, etc.) directly to an internal page, all the spiders will get in and because you have not put in place the usual spider protection your last state will be worse than your first.

enigma1




msg:3961625
 9:54 am on Jul 29, 2009 (gmt 0)

Vince, I am not sure I understand your point. A form for any new ip excluding known spiders is to ensure human presence. The difference from a normal site is that known spiders will get through as before while everyone else the first time will need to click a button.

incrediBILL




msg:3962642
 5:49 pm on Jul 30, 2009 (gmt 0)

If the only page you have this BUTTON on is the home page, it really doesn't stop the spiders, or the people, from bypassing your home page unless you're tracking access via sessions and kick new sessions back to the home page.

If you do that however, you will tick off people coming to your site via SERPs to direct pages.

enigma1




msg:3962719
 8:39 pm on Jul 30, 2009 (gmt 0)

Bill, it won't apply to spiders. Thats only for whoever wants to enter the site as a human (basic UA check for http or spiders whitelist). But for humans it will apply to all pages the first time not just the home page.

Yes sessions are already used to post forms, create accounts etc, otherwise you only view pages.

But the thing is spiders will go through without scrutiny while humans the first time will have to click the button. I can set it up to use an ip or a cookie as signature for sometime so this won't be repeated for subsequent accesses.

This approach I don't know if it will cause spiders to detriment the value of the site or even remove it from their index.

incrediBILL




msg:3962773
 10:02 pm on Jul 30, 2009 (gmt 0)

How will spiders know if it's working on humans only? They won't unless someone complains.

However, as you're describing it you'll be violating Google's 1st page free rule, meaning you should see the 1st indexed content page before being hit with a login or anything else.

After viewing one free page you can block them for login, but not before.

enigma1




msg:3964473
 8:09 am on Aug 3, 2009 (gmt 0)

Yes, that's what I am afraid of.

Another approach against scraping perhaps, will be to allow the humans to view a content page if and only if, it is first indexed by the popular spiders otherwise give them a 302. It should at least give the site admin the lead for the content.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Marketing and Biz Dev / Cloaking
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved