| 4:28 am on Jul 21, 2009 (gmt 0)|
The moment someone posts a link (blog, forum, etc.) directly to an internal page, all the spiders will get in and because you have not put in place the usual spider protection your last state will be worse than your first.
| 9:54 am on Jul 29, 2009 (gmt 0)|
Vince, I am not sure I understand your point. A form for any new ip excluding known spiders is to ensure human presence. The difference from a normal site is that known spiders will get through as before while everyone else the first time will need to click a button.
| 5:49 pm on Jul 30, 2009 (gmt 0)|
If the only page you have this BUTTON on is the home page, it really doesn't stop the spiders, or the people, from bypassing your home page unless you're tracking access via sessions and kick new sessions back to the home page.
If you do that however, you will tick off people coming to your site via SERPs to direct pages.
| 8:39 pm on Jul 30, 2009 (gmt 0)|
Bill, it won't apply to spiders. Thats only for whoever wants to enter the site as a human (basic UA check for http or spiders whitelist). But for humans it will apply to all pages the first time not just the home page.
Yes sessions are already used to post forms, create accounts etc, otherwise you only view pages.
But the thing is spiders will go through without scrutiny while humans the first time will have to click the button. I can set it up to use an ip or a cookie as signature for sometime so this won't be repeated for subsequent accesses.
This approach I don't know if it will cause spiders to detriment the value of the site or even remove it from their index.
| 10:02 pm on Jul 30, 2009 (gmt 0)|
How will spiders know if it's working on humans only? They won't unless someone complains.
However, as you're describing it you'll be violating Google's 1st page free rule, meaning you should see the 1st indexed content page before being hit with a login or anything else.
After viewing one free page you can block them for login, but not before.
| 8:09 am on Aug 3, 2009 (gmt 0)|
Yes, that's what I am afraid of.
Another approach against scraping perhaps, will be to allow the humans to view a content page if and only if, it is first indexed by the popular spiders otherwise give them a 302. It should at least give the site admin the lead for the content.