Forum Moderators: phranque

Message Too Old, No Replies

mod rewrite based on HTTP REFERER

spam blocking inquiry

         

jpat34721

12:17 am on Jul 14, 2006 (gmt 0)

10+ Year Member



I have a very active website for cancer patients. To block spambots on my forums, I would like to redirect requests for any page that reside in a certain directory (the directory which contain the actual message files from my patients) if the request did not come from the forum index page (i.e. have a foriegn HTTP_REFERER). The redirection would be to a linkless page that explains how to access the message board. I want to redirect rather than deny so that legitimate patients who found a link on a search engine are guided to the information they need.

Can anyone help me implement such a policy using mod_rewrite (or some other means).

Thanks,
Jeff Patterson

Little_G

12:58 am on Jul 14, 2006 (gmt 0)

10+ Year Member



Hi,

This should work:

RewriteCond %{HTTP_REFERER} !example.com
RewriteRule .* /accesspage.html [L]

Andrew

[edited by: Little_G at 1:01 am (utc) on July 14, 2006]

jdMorgan

1:20 am on Jul 14, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This will destroy the ability of search engine spiders to crawl your site, because they provide no referrer header...

Jim

Little_G

9:51 am on Jul 14, 2006 (gmt 0)

10+ Year Member



Hi,

True, try try this instead:

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !yoursite.com [NC]
RewriteRule .* /accesspage.html [L]

Andrew

jdMorgan

1:59 pm on Jul 14, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Unfortunately, that defeats the stated purpose of the project (See the first post).

The basic problem, described here many times [google.com], is that referrer-based access control is unreliable because the Referer header is unreliable; Many corporate and ISP caching proxies block the Referer header, and many browsers and "Internet Security" software packages can be set by the user (or come set by default) to block the Referer header.

It is also folly to force all search-derived traffic to a landing page -- This is a recipe for losing visitors quickly. Especially when those visitors are 'distressed' and are likely not part of a tech-savvy Web demographic. It's important to back off from the problem at hand, and take a look at the big picture: The effects on search-engine derived traffic and site usability issues are both important. If you block rogue 'bots, but kill your site, what's the point?

A cookies-and-script -based approach (described in some of the previous threads) with exclusions to allow search engine robots to spider the site, is a better way forward.

Jim

jpat34721

3:48 pm on Jul 14, 2006 (gmt 0)

10+ Year Member



Thanks for all the helpful comments but I think I should clarify. I don't want the spiders to spider the directories with the patients message files. That's how I got into this mess in the first place. My robots.txt didn't exclude this directory (it does now) and the message files were coming up on the search engines which the spambots use to both scrap the files for emails and attempt to spam the boards. I've eliminated the latter problem, now I'm trying to eliminate the former.

So I don't want to redirect _all_ search engine originated visits. Just the ones that are trying to access the message file directory. I want to force all access to these files through my cgi script where I can detect and block scrappers.