Forum Moderators: phranque

Message Too Old, No Replies

Protecting guestbook pages from email harvesters

         

Scooter24

1:53 pm on Mar 5, 2003 (gmt 0)

10+ Year Member Top Contributors Of The Month



Is there a way to make specific pages browsable by humans but not by crawlers?

What if the link 'View guestbook' in my pages instead of leading directly to the guestbook page leads to a page with the text "... to protect the guestbook from email harvesting programs we implemented a safety mechanism ... press the button to get to the guestbook page...". The page contains a form with one button and the user has to press the button to reach the guestbook page. The button itself launches a PHP script which delivers the guestbook page.

Would this work? Or perhaps something else?

Dreamquick

2:17 pm on Mar 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You idea sounds okay you just need to make sure that the final page can't be accessed without going via the "button screen" and that the PHP that the "button screen" goes to actually checks that the button was pressed somehow - otherwise a bot could just get the page name and go there.

I was pondering something similar myself - what it normally comes down to is a compromise between the amount of security you want and the amount of inconvienience you cause to the end-user. My ponderings related to a way to protect a personal version of a "contact me" page. I already have a reasonably stable spambot blocker but it does produce a few false positives so I decided to explore a few other avenues.

If you want to use a purely UA based approach you will stop the bots which don't/can't change their UA, but anything apart from that will get through. This is totally transparent to the user but is open to abuse.

The next step up is along the same lines but also involves deciding when a user is actually a bot pretending to a be a user. Again this is transparent to the user - but only if your rules identify them correctly as false positives can be an issue.

An alternate approach involves making the user answer a question before they get access - you ask them to enter a dynamically generated code and submit the form. Assuming their answer is correct then you show them the guestbook (and possibly mark them as a valid user incase they want to browse it more than once in one visit).

This will stop all but the most determined of guestbook scrapers, however the downside is that it does inconvienince the user a little!

- Tony

carfac

4:04 pm on Mar 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Scooter:

If you have root access to your server (and probably .htaccess would work, too- I do not know that method well) you could use a version of the mod_rewrite rules posted elsewhere on this site.

Anyway, with mod_rewrite, you could pick a term that is unique to the pages you want to protect, and make a super-resticted list of UA's to block for any requests matching that string.

No, not 100% effective, but it would add a second layer of defense for your site.

Another idea is to make an invisible link to "guestb.html" and "email.html" and have those point to a banning spider trap (again, you can find these in other threads on this board)

If you need help implimenting either (or both) of these solutions, there are plenty of people here willing to help! Good luck Scooter!

dave