homepage Welcome to WebmasterWorld Guest from 54.161.214.221
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / WebmasterWorld / Webmaster General
Forum Library, Charter, Moderators: phranque

Webmaster General Forum

    
Preventing your site from being crawled
or returning the favor
DXL




msg:349838
 10:37 pm on Dec 25, 2005 (gmt 0)

My referrals page is completely filled with links from poker and viagra sites. First off, is there a way I can keep these guys from showing up in my referral logs?

Second, someone on here once mentioned a long while back that when those sites show up in his logs, he returns the favor by crawling their own site a few times. How exactly do you crawl their site, particularly so that your site address shows up in their logs?

 

kevinpate




msg:349839
 4:33 pm on Dec 26, 2005 (gmt 0)

there simply must be a multitude of far better, not to mention vastly more productive, ways to spend your time. :)

oddsod




msg:349840
 5:02 pm on Dec 26, 2005 (gmt 0)

kevinpate, even if they're distorting your figures and making it difficult for you to analyse your stats?

killroy




msg:349841
 5:09 pm on Dec 26, 2005 (gmt 0)

set a simple env in http.conf and then only log those without this env set (this is for Apache). This helps em also to clean my logs form images and css/js files.

DXL




msg:349842
 8:39 pm on Dec 26, 2005 (gmt 0)

kevinpate, even if they're distorting your figures and making it difficult for you to analyse your stats?

Exactly, its getting to the point where I can't see many of my legit referrers because My stats program only lists a certain number of sites (plenty of them, and even then my list is bulging with spam).

I'm still trying to figure out how it is they actually end up in my logs, is it a program they run that randomly picks out sites? Or are they manually entering my site's name when they do it.

Key_Master




msg:349843
 9:11 pm on Dec 26, 2005 (gmt 0)

Check your logs for user-agent:
Mozilla/4.0 (compatible; Cerberian Drtrs Version-3.2-Build-0)

This seems to be a feeler application looking for sites to spam. Especially the user-agents that like to POST spam to forums, blogs, etc.

I'm experimenting with software that detects the spam referrer, feeds the spam page to the user-agent using the same user-agent/referrer spam url, and auto bans the referring domain (not url) to .htaccess. Seems to be working very effectively.

You can return the "favor" doing the same thing manually using an app like Sam Spade. If you decide to do this, I wouldn't recommend using your own site url. No need to draw attention to yourself.

Yeah, I know- I enjoy this too much. Bots can be fun. :)

DoppyNL




msg:349844
 2:28 pm on Dec 27, 2005 (gmt 0)

Mozilla/4.0 (compatible; Cerberian Drtrs Version-3.2-Build-0)

Cerberian is a monitoring tool for checking what users are visiting on the net and to determine if they are supposed to look at that.
Anyone got any of those small kids you want to keep of XXX site's?
Check your logs carefully for this useragent, as you will notice it requests the same uri's as a normal visitor does! Most of the times at allmost the same time as the `original` request. Similar to Google's Mediapartners crawler.

As for the original question:
Make sure all your stats are NOT public is step 1.
Filtering the referers is step 2, you can do this when the request comes in or at a later stage.

Key_Master




msg:349845
 9:35 pm on Dec 27, 2005 (gmt 0)

I figured somebody would state it's a monitoring agent. However, this agent comes through some of the same proxies IPs that the spam bots come through and user-agents are too easy to fake. So you can't judge a book by it's cover.

I set up domains designed to attrack spam bot activity and farm any bots that take the bait. I'm talking about no-name sites that don't recieve hits from anybody but spam bots. Curiously enough, this agent is among the first to hit, and it is shortly followed by spam bots. Ban the spam bots and the hit frequency of this user-agent increases.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Webmaster General
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved