homepage Welcome to WebmasterWorld Guest from 54.166.173.147
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Best way to block referral spammers?
aristotle




msg:4664548
 2:38 pm on Apr 20, 2014 (gmt 0)

On most of my sites, my current .htaccess code for blocking referral spam is as follows:
# BLOCK REFERERS
RewriteCond %{HTTP_REFERER} formatn [NC,OR]
RewriteCond %{HTTP_REFERER} kochanelli\.com [NC,OR]
RewriteCond %{HTTP_REFERER} chimiver [NC,OR]
RewriteCond %{HTTP_REFERER} poker [NC,OR]
RewriteCond %{HTTP_REFERER} semalt [NC,OR]
RewriteCond %{HTTP_REFERER} thepostemail [NC,OR]
RewriteCond %{HTTP_REFERER} prostitutki [NC,OR]
RewriteRule .* - [F]

I could make a much longer list, but don't like to take the time to bother with minor offenders, so usually only add something if it becomes annoying to me.

Anyway, I'm wondering if there's a more efficient way to do this. For example, my current code for blocking country domains is:
# BLOCK COUNTRY DOMAINS
RewriteCond %{HTTP_REFERER} \.(ru|su|ua|cn|md|pl|ro)(/|$) [NC,OR]
RewriteCond %{HTTP_REFERER} \.(by|bg|hr|cz|hu|jp)(/|$) [NC]
RewriteRule .* - [F]
ErrorDocument 403 "Access Denied"

So I'm thinking that I might be able to use the vertical bar separators for both cases. Or if there's a more efficient way to do it , I would like to learn about it.

Also, do I really need the error document line in the block country domains section?

 

not2easy




msg:4664557
 3:25 pm on Apr 20, 2014 (gmt 0)

Most referer spam is not actual links or visitors to your site. Check your access logs to find the lines that show a referer and see whether it is a human type visit or not. Nine times out of ten you will see a few referers (I usually see them 3 in a row)you will see them all request the same URL, not actally loading the supporting files for the page or image requested and no more. Look at the IP address and check the whois. If it is a server farm, block them or there will be new faked referers every day, or every few hours.

The semalt entry has its own little discussion here: [webmasterworld.com...]

lucy24




msg:4664569
 6:03 pm on Apr 20, 2014 (gmt 0)

Yes, you can use pipes with your list of names.
RewriteCond %{HTTP_REFERER} formatn [NC,OR]
RewriteCond %{HTTP_REFERER} kochanelli\.com [NC,OR]
RewriteCond %{HTTP_REFERER} chimiver [NC]

=
RewriteCond %{HTTP_REFERER} (formatn|kochanelli\.com|chimiver) [NC]

Think of this as belt-and-suspenders. Referer-based blocks are a useful backup, but in the long term you'll find that the vast majority of these requests come from IP ranges that would be blocked in their own right.

If you are rearranging lines, always double-check to make sure the last Condition in any ruleset doesn't end in [OR], as this will crash your server. Yes, OK, it will result in a 503 error for all requests. Picky, picky.

You might also think about constraining this type of rule to requests for pages:

RewriteCond blahblah
RewriteRule (^|\.html|/)$ - [F]


It's rare for an unwanted robot to request anything other than a page. This way, the server doesn't have to stop and evaluate a long list of conditions on every non-page-- meaning human or legitimate search engine-- request.

aristotle




msg:4664577
 7:33 pm on Apr 20, 2014 (gmt 0)

Thanks for the replies. Following Lucy's suggested improvements, I came up with the following new code:
# BLOCK REFERERS
RewriteCond %{HTTP_REFERER} (formatn|kochanelli\.com|chimiver|poker) [NC,OR]
RewriteCond %{HTTP_REFERER} (semalt|prostitutki|thepostemail) [NC]
RewriteRule (^|\.html|/)$ - [F]

Also, actually I didn't have an "OR" in the last line of the real code, but changed it in the code I posted because the original last line contained one of those "bad words" that WebMasterWorld objects to. This caused the new last line to have an erroneous "OR" which I didn't notice at the time.

As for the botnets that harbor the software for these fake referrals, I'll have to look into that later.

lucy24




msg:4664590
 9:05 pm on Apr 20, 2014 (gmt 0)

Also: Check your logs periodically. Referer spam tends to come and go. No point in wasting server resources on something that hasn't been around since 2011. You may find it easier to maintain if you set up a separate ruleset for transitory offenders. Come to think of it, that's just what I do with email. "extra junk" is a separate ruleset from the ordinary junk-mail rules. When they make up a new way to misspell 'v####a', the old one goes away.

aristotle




msg:4664729
 6:30 pm on Apr 21, 2014 (gmt 0)

I suspect that referral spam is mostly used for new or fly-by-night sites. After these sites disappear, referral spam for them probably fades away too.
I also wonder if old botnets might gradually disappear.

I watch my logs all the time, and sometimes see some odd things. On one of my sites I have a large pdf file (about 600 kb) containing some reference infomation. Most visitors never go near it, but one afternoon about a year ago, someone (a real human) downloaded this pdf file more than 50 times. Why would someone download the same large file over and over again - I have no idea.

lucy24




msg:4664754
 9:22 pm on Apr 21, 2014 (gmt 0)

Was it a series of 200s or were they 206 ("partial")? This question came up once before. Can't remember if it was here or my ebooks forum. But the explanation turned out to be that some browsers download pdfs in small pieces so they can start displaying them right away. So your logs get clogged up with mountains of 206 requests for a single file.

aristotle




msg:4664766
 11:05 pm on Apr 21, 2014 (gmt 0)

I'm pretty sure they were all full downloads. I would have noticed 206 partials, because I've seen facebook do it a lot. I just remember getting annoyed, wondering what the heck this person was doing.

Another odd unexplained thing I see sometimes (in statcounter) is somebody refreshing the same page every few seconds. This happens surprisingly often, sometimes even on pages that haven't been touched in years.

penders




msg:4664769
 12:07 am on Apr 22, 2014 (gmt 0)

Why would someone download the same large file over and over again...


A very hands on DoS attack?!

This question came up once before. Can't remember if it was here or my ebooks forum.


I think I remember that - which would mean it was here? Wasn't "mobile" browsers suggested as a possible culprit? Or was that another thread?

lucy24




msg:4664784
 1:31 am on Apr 22, 2014 (gmt 0)

After posting, I remembered some more, which led me to this:

[webmasterworld.com...]

I'd forgotten the "mobile" aspect, but indeed that was one suggestion. And the punchline is that, in the end, the answer to the conundrum was found by the same person who originally asked.

aristotle




msg:4664985
 12:16 pm on Apr 22, 2014 (gmt 0)

Why would someone download the same large file over and over again...
A very hands on DoS attack?!

I suppose that's possible, but if so, it was a feeble lackadaisical effort. If I remember correctly, the downloads were spread out over a full afternoon and maybe even into the evening, with various spurts of activity and time gaps. In any case, I doubt that one person alone could carry out a successful DOS attack by hand.

I read the other thread about partial downloads, and have occasionally seen that on my site too. But my pdf has various tables and charts of statistical data copied from government websites, and I don't know what would happen if a partial download ended in the middle of a chart.

lucy24




msg:4665013
 2:22 pm on Apr 22, 2014 (gmt 0)

I don't know what would happen if a partial download ended in the middle of a chart.

Nothing. Or, at least, nothing more than would similarly happen if a page break lands in the middle of a chart when you're making a pdf from html.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved