Forum Moderators: phranque
Second, someone on here once mentioned a long while back that when those sites show up in his logs, he returns the favor by crawling their own site a few times. How exactly do you crawl their site, particularly so that your site address shows up in their logs?
kevinpate, even if they're distorting your figures and making it difficult for you to analyse your stats?
Exactly, its getting to the point where I can't see many of my legit referrers because My stats program only lists a certain number of sites (plenty of them, and even then my list is bulging with spam).
I'm still trying to figure out how it is they actually end up in my logs, is it a program they run that randomly picks out sites? Or are they manually entering my site's name when they do it.
This seems to be a feeler application looking for sites to spam. Especially the user-agents that like to POST spam to forums, blogs, etc.
I'm experimenting with software that detects the spam referrer, feeds the spam page to the user-agent using the same user-agent/referrer spam url, and auto bans the referring domain (not url) to .htaccess. Seems to be working very effectively.
You can return the "favor" doing the same thing manually using an app like Sam Spade. If you decide to do this, I wouldn't recommend using your own site url. No need to draw attention to yourself.
Yeah, I know- I enjoy this too much. Bots can be fun. :)
Cerberian is a monitoring tool for checking what users are visiting on the net and to determine if they are supposed to look at that.
Anyone got any of those small kids you want to keep of XXX site's?
Check your logs carefully for this useragent, as you will notice it requests the same uri's as a normal visitor does! Most of the times at allmost the same time as the `original` request. Similar to Google's Mediapartners crawler.
As for the original question:
Make sure all your stats are NOT public is step 1.
Filtering the referers is step 2, you can do this when the request comes in or at a later stage.
I set up domains designed to attrack spam bot activity and farm any bots that take the bait. I'm talking about no-name sites that don't recieve hits from anybody but spam bots. Curiously enough, this agent is among the first to hit, and it is shortly followed by spam bots. Ban the spam bots and the hit frequency of this user-agent increases.