Welcome to WebmasterWorld Guest from 54.224.230.193

Forum Moderators: phranque

Message Too Old, No Replies

Preventing your site from being crawled

or returning the favor

     

DXL

10:37 pm on Dec 25, 2005 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 9, 2002
posts:722
votes: 0


My referrals page is completely filled with links from poker and viagra sites. First off, is there a way I can keep these guys from showing up in my referral logs?

Second, someone on here once mentioned a long while back that when those sites show up in his logs, he returns the favor by crawling their own site a few times. How exactly do you crawl their site, particularly so that your site address shows up in their logs?

4:33 pm on Dec 26, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 2, 2002
posts:1167
votes: 0


there simply must be a multitude of far better, not to mention vastly more productive, ways to spend your time. :)
5:02 pm on Dec 26, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 21, 2005
posts:2259
votes: 0


kevinpate, even if they're distorting your figures and making it difficult for you to analyse your stats?
5:09 pm on Dec 26, 2005 (gmt 0)

Senior Member from MT 

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 1, 2003
posts:1843
votes: 0


set a simple env in http.conf and then only log those without this env set (this is for Apache). This helps em also to clean my logs form images and css/js files.

DXL

8:39 pm on Dec 26, 2005 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 9, 2002
posts:722
votes: 0


kevinpate, even if they're distorting your figures and making it difficult for you to analyse your stats?

Exactly, its getting to the point where I can't see many of my legit referrers because My stats program only lists a certain number of sites (plenty of them, and even then my list is bulging with spam).

I'm still trying to figure out how it is they actually end up in my logs, is it a program they run that randomly picks out sites? Or are they manually entering my site's name when they do it.

9:11 pm on Dec 26, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 27, 2001
posts:1472
votes: 0


Check your logs for user-agent:
Mozilla/4.0 (compatible; Cerberian Drtrs Version-3.2-Build-0)

This seems to be a feeler application looking for sites to spam. Especially the user-agents that like to POST spam to forums, blogs, etc.

I'm experimenting with software that detects the spam referrer, feeds the spam page to the user-agent using the same user-agent/referrer spam url, and auto bans the referring domain (not url) to .htaccess. Seems to be working very effectively.

You can return the "favor" doing the same thing manually using an app like Sam Spade. If you decide to do this, I wouldn't recommend using your own site url. No need to draw attention to yourself.

Yeah, I know- I enjoy this too much. Bots can be fun. :)

2:28 pm on Dec 27, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 10, 2004
posts:172
votes: 0


Mozilla/4.0 (compatible; Cerberian Drtrs Version-3.2-Build-0)

Cerberian is a monitoring tool for checking what users are visiting on the net and to determine if they are supposed to look at that.
Anyone got any of those small kids you want to keep of XXX site's?
Check your logs carefully for this useragent, as you will notice it requests the same uri's as a normal visitor does! Most of the times at allmost the same time as the `original` request. Similar to Google's Mediapartners crawler.

As for the original question:
Make sure all your stats are NOT public is step 1.
Filtering the referers is step 2, you can do this when the request comes in or at a later stage.

9:35 pm on Dec 27, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 27, 2001
posts:1472
votes: 0


I figured somebody would state it's a monitoring agent. However, this agent comes through some of the same proxies IPs that the spam bots come through and user-agents are too easy to fake. So you can't judge a book by it's cover.

I set up domains designed to attrack spam bot activity and farm any bots that take the bait. I'm talking about no-name sites that don't recieve hits from anybody but spam bots. Curiously enough, this agent is among the first to hit, and it is shortly followed by spam bots. Ban the spam bots and the hit frequency of this user-agent increases.