Welcome to WebmasterWorld Guest from 54.145.209.34

Forum Moderators: phranque

Preventing your site from being crawled

or returning the favor

   

DXL

10:37 pm on Dec 25, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My referrals page is completely filled with links from poker and viagra sites. First off, is there a way I can keep these guys from showing up in my referral logs?

Second, someone on here once mentioned a long while back that when those sites show up in his logs, he returns the favor by crawling their own site a few times. How exactly do you crawl their site, particularly so that your site address shows up in their logs?

4:33 pm on Dec 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



there simply must be a multitude of far better, not to mention vastly more productive, ways to spend your time. :)
5:02 pm on Dec 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



kevinpate, even if they're distorting your figures and making it difficult for you to analyse your stats?
5:09 pm on Dec 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



set a simple env in http.conf and then only log those without this env set (this is for Apache). This helps em also to clean my logs form images and css/js files.

DXL

8:39 pm on Dec 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



kevinpate, even if they're distorting your figures and making it difficult for you to analyse your stats?

Exactly, its getting to the point where I can't see many of my legit referrers because My stats program only lists a certain number of sites (plenty of them, and even then my list is bulging with spam).

I'm still trying to figure out how it is they actually end up in my logs, is it a program they run that randomly picks out sites? Or are they manually entering my site's name when they do it.

9:11 pm on Dec 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Check your logs for user-agent:
Mozilla/4.0 (compatible; Cerberian Drtrs Version-3.2-Build-0)

This seems to be a feeler application looking for sites to spam. Especially the user-agents that like to POST spam to forums, blogs, etc.

I'm experimenting with software that detects the spam referrer, feeds the spam page to the user-agent using the same user-agent/referrer spam url, and auto bans the referring domain (not url) to .htaccess. Seems to be working very effectively.

You can return the "favor" doing the same thing manually using an app like Sam Spade. If you decide to do this, I wouldn't recommend using your own site url. No need to draw attention to yourself.

Yeah, I know- I enjoy this too much. Bots can be fun. :)

2:28 pm on Dec 27, 2005 (gmt 0)

10+ Year Member



Mozilla/4.0 (compatible; Cerberian Drtrs Version-3.2-Build-0)

Cerberian is a monitoring tool for checking what users are visiting on the net and to determine if they are supposed to look at that.
Anyone got any of those small kids you want to keep of XXX site's?
Check your logs carefully for this useragent, as you will notice it requests the same uri's as a normal visitor does! Most of the times at allmost the same time as the `original` request. Similar to Google's Mediapartners crawler.

As for the original question:
Make sure all your stats are NOT public is step 1.
Filtering the referers is step 2, you can do this when the request comes in or at a later stage.

9:35 pm on Dec 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I figured somebody would state it's a monitoring agent. However, this agent comes through some of the same proxies IPs that the spam bots come through and user-agents are too easy to fake. So you can't judge a book by it's cover.

I set up domains designed to attrack spam bot activity and farm any bots that take the bait. I'm talking about no-name sites that don't recieve hits from anybody but spam bots. Curiously enough, this agent is among the first to hit, and it is shortly followed by spam bots. Ban the spam bots and the hit frequency of this user-agent increases.

 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month