Forum Moderators: DixonJones
I've been lurking on WW for a while, and having come across a number of threads that mention logfile spamming I thought i'd sign up and share my solution to this growing annoyance. Also apologies if this solution has already been discussed and I haven't looked hard enough!
First off, I configured Apache to log referrers into a separate file. This is a slight tweak from the stock Apache installation which logs everything into one file.
I then wrote a Perl script to look for any referrers that haven't been seen before (both legit or spam referrers).
Any new referrers are then retrieved using "wget", and the returned page "grep"'ed for a link to my site.
If a link to my site is found the referrer is added to my "legit" referrers log. If not (or wget fails); it goes into my "spammers" referrer log - which I use as a database of companies with whom I shall never do business! :)
Note that I keep record of all URL's visited like this so that I only ever wget a particular referrer once - ever.
As a side issue, I have recently started to notice a number of referrers that do not smell of spam, and do not link to my site. This includes about:blank - which leads me to believe that there is a browser/OS combination out there which occasionally (under some error condition) sends out whatever URL the browser was at previously as the referrer value for a new page.... which is a bit worrying from a privacy point of view!
(PS - whilst it's too late to edit the post title - I accept that this is not a "solution" as such - more my way of dealing with the problem in the same way that email spam filtering works...)
[edited by: dmorison at 4:32 pm (utc) on April 3, 2003]
...also has the potential to let someone malicious use your server to attack someone else's by feeding you the address of say their formmail script etc.
- Tony
Of course it can be defeated by the determined - but for starters the script call does not come from the same IP as the site. It certainly cuts out the rubbish that reaches my eyes.
As regards using the script to launch an attack - yes, i've considered that and it does worry me slightly and I have put some mechanism in to potect against this. The site is regularly linked in bulletin boards which are not retrieved as a result of this protection; but i'm not too worried about seeing BB links to my site.
I've got a post in this same forum about getting over 7k hits on my formmail page in less than 2 days.
Can you explain further about how to stop this or at least defend myself a little better?
Thanks for any and all help in this matter.
Rich