Forum Moderators: phranque

Message Too Old, No Replies

Blocking referer spam

         

cla313

12:51 pm on May 20, 2008 (gmt 0)

10+ Year Member




System: The following 11 messages were cut out of thread at: http://www.webmasterworld.com/forum11/2095.htm [webmasterworld.com] by encyclo - 9:48 pm on May 21, 2008 (utc -4)


Hi.
I am getting referer spam (my log files are not public, but I want anyway to get rid of all that crap). Is there a way I could condensate the lines below (those text strings appear in various order in the so-called refering URL)?

RewriteEngine On
...
#go away weblinkvalidator
RewriteCond %{HTTP_REFERER} ^http://www.weblinkvalidator.com/*
RewriteRule ^.*$ - [F]
#block referrer spam
RewriteCond %{HTTP_REFERER} ^http://(www\.)?.*[-.]viagr[-.] [OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?.*[-.]sex[-.] [OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?.*[-.]adult[-.] [OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?.*[-.]spycam[-.]
RewriteRule .* - [F]

Thanks!

wilderness

1:49 pm on May 20, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Your really wasting your time!

Denying access to the refer spam will not remove the access attempts from your visitor logs.
The refers will still appear, they will simply be 403'd.

If your logs are not open to the public, their spam attemps (as annoying as they may be), do not present any real issues.

Don

edited by wilderness.

There is an OLD thread some where (do not recall which forum) at Webmaster World which provides an example of how to automatically removed such lines from your logs, unfortuantely I do not recall whether it's an htaccess or httpd.conf solution.

jdMorgan

2:37 am on May 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's a speed-up of the anti-hotlinking section, with no change whatsoever to function:

RewriteCond %{HTTP_REFERER} .
RewriteCond %{HTTP_REFERER} !^http://(www\.)?yourdomain\.com/ [NC]
RewriteCond %{HTTP_REFERER} !^http://(www\.)?yourOtherdomain\.net/ [NC]
RewriteRule \.(jpg¦gif¦pdf)$ - [NC,F]

Jim

wilderness

3:34 am on May 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Here's a speed-up of the anti-hotlinking section, with no change whatsoever to function:

A year or two ago, I also added exceoptions for some web accessed/universal email accounts.

Two that come to mind are MSNTV (webtv) and Yahoo (Both US & CA).
I'm sure I could easily add half-a-dozen more.
As could anybody else who utilizes there websites pages-links with an large email distribution.

Don

blend27

2:58 pm on May 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



-- Denies both blank UA and UA that contains "-". --

What About UAs that do not contain ( and/or ). The only exception to that I would see is for DMOZ Spider and Archiver(if you will). Or are there others?

wilderness

3:42 pm on May 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



-- Denies both blank UA and UA that contains "-". --

What About UAs that do not contain ( and/or ). The only exception to that I would see is for DMOZ Spider and Archiver(if you will). Or are there others?

blend,
Perhaps I'm a little dense today, however it's not clear to me what your asking?

This basic anti-hotlinking section has been in effect on my websites since early 2000.
I don't make excecptions for DMOZ, archiver or any other major website (at least to the majority of directories within my websites strucures.

I do have other directories which allow visitors from ranges that are denied from the remaining portions of my websites (RIPE, APNIC and others), however those exception-visitors are denied access to images and even CSS that stored in directories outside the excpetion-directory.
The reasoning for the above excpetions are part of an agreement or protocol (for lack of a more appropiate word) that I felt necessary to accompany the use of the materials. However, even these exceptions offer penalties (harvesting by otherwise non-deisred UA's and bots), and benefits (awareness of the activitiy by otherwise non-deisred UA's and bots).

My sites are simply very unique regarding their content (perhaps a more effective access would be paid-access, however that market is rapidly disappearing, a new alternative may be "closed-access" beyond the internet).
And the choices of denial or access were necessary as part of both a protection, control and use.

Initially, I did not care for the use of multiple htaccess files (too much work) for multiple directories, however it has been quite effective from my own standpoint.

Don

blend27

9:06 pm on May 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Don,

if UA does not have ) or ( character in it, parentheses, is what I mean.

that covers Java 1.05 for example or the most default Default User Agents of Programming Libraries and Command Line Tools OR all that Junk that comes in as "KHSDyhkjsncuishefd klsd slihfdre".

The reason that I have mentioned DMOZ, is they do checks if the site is still alive and their bot usualy comes from netscape range. and It does not have ) character in it. For the love of HTML I dont remember the name of it.

wilderness

11:17 pm on May 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



blend,
I do not recall having a deny which is based on the lack of parenthenses, howwever I have some white-listing lines that were graciously supplied to my by a friend.
Many of the applications of these lines (and many other of the wildcard expressions for rewites), I simply do not comprehend.

I've had a curse my entire life with an ability to grab the bull by the horns wihout understanding (or even without desire) the simpliest of procedures.

The majority of my htaccess rewrites are simple by nature.

I do recall having a rewrite that is based on a double-parenthenses-ends-with.

If DMOZ comes to my sites? I don't recall it in recent times.
Seems to me there has a DMOZ-nutch that gets denied under the nutch umbrella.

Don

jdMorgan

11:37 pm on May 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



DMOZ/ODP's user-agent is "Robozilla/" with the version numbers after the slash.

Jim

wilderness

12:20 am on May 22, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Many thanks Jim.

Robozilla not in either my robots.txt or denied UA's.
Perhaps I have their IP range denied.

Their hasn't been a voulnteer in the DMOZ for my widgets for 6-7 years. I applied a couple of time and simply stopped after illogical answers.

Don

blend27

12:41 am on May 22, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Robozilla as well as Nutch from DMOZ comes from 207.200.64.0/18