homepage Welcome to WebmasterWorld Guest from 54.211.80.155
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Munax bots cloaking themselves and causing high server load
Asia_Expat




msg:3616495
 7:04 pm on Apr 1, 2008 (gmt 0)

I recently noticed a peaking in my server load. At the same time, I noticed on my forum a certain range of IP addresses continually crawling over and over, they seemed particularly interested in the 'register' page.

I did some digging around about this IP range and traced them to a Swedish company called Munax. They offer the following statement about their spidering activities, in which they openly claim to cloak their spiders to appear as a regular human being...
[munax.com...]

They also claim to respect the robots.txt file but I can assure you that claim is false. My forum is carefully managed in my robots file and these IP's are totally ignoring it. Further, they are crawling so aggressively and with such repetition, it's bringing my server to it's knees at times and also making it look like there are many more visitors than there actually is.

The IP range, as far as I can tell, is...
82.99.30.0 - 82.99.30.127

What can I do about this? If I place the following in my htaccess file in the forum directory, will it work and will I potentially be doing any harm?...

<Files *>
order allow,deny
allow from all
deny from 82.99.30.0/127
</Files>

 

Staffa




msg:3616849
 6:53 am on Apr 2, 2008 (gmt 0)

I saw them yesterday too. Grabbed index file and no robots.txt, got promptly banned ;o)

incrediBILL




msg:3616856
 7:07 am on Apr 2, 2008 (gmt 0)

I've been watching them for quite some time and this statement from their site is a real winner:

Our crawler does not have a "name", yet. Instead it announces itself to be a standard web browser, a "Mozilla 4.0" kind-of-browser compatible with the browser Microsoft Internet Explorer 6.0, running on the Windows NT 5.1 operating system.

They could actually include their path in the MSIE user agent, others do that, nothing new there.

Sorry, if you can't identify yourself properly you can't play in my sandbox.

End of story.

Asia_Expat




msg:3616901
 8:46 am on Apr 2, 2008 (gmt 0)

Is my above code correct to ban them... I just add that to my current htaccess file, yes?

thetrasher




msg:3617019
 11:29 am on Apr 2, 2008 (gmt 0)

deny from 82.99.30.0/127
deny from 82.99.30.0/25
Hobbs




msg:3617311
 4:32 pm on Apr 2, 2008 (gmt 0)

Unsig every IP between 82.99.30.2 to 82.99.30.64
570 hits in 2 days
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
bye bye 82.99.30.0/25

[added: They also do not follow robots.txt and fell into my bot trap]

[edited by: Hobbs at 4:39 pm (utc) on April 2, 2008]

wilderness




msg:3617461
 7:13 pm on Apr 2, 2008 (gmt 0)

for those that use rewrites.

RewriteCond %{REMOTE_ADDR} ^82\.99\.30\.(9[6-9]¦1[01][0-9]¦12[0-7])$ [OR]

Asia_Expat




msg:3618905
 9:57 am on Apr 4, 2008 (gmt 0)

I successfully blocked them and my forum shows a halving of online users at any given time :-D LOL

Achernar




msg:3619535
 9:45 pm on Apr 4, 2008 (gmt 0)

The IP range, as far as I can tell, is...
82.99.30.0 - 82.99.30.127

What can I do about this?

Funny. They also crawled a couple of sites I manage, and were always caught in the bot-traps.

This IP range is now definitively banned at the firewall level for all the servers I manage:
82.99.30.0/24

Megaclinium




msg:3621227
 8:40 pm on Apr 7, 2008 (gmt 0)

eek! I just UN-banned them afer a few months.
I had re-extracted weblogs for just their IPs and shows it DID grab the robots.txt which I hadn't thought it did, when I summed stats from a combined months file.

I do see them as referrer occasionally in traffic. I didn't really check whether it obeyed robots.txt.

Achernar




msg:3621367
 12:07 am on Apr 8, 2008 (gmt 0)

They don't obey robots.txt, else they wouldn't get caught in bot-traps.

[edited by: Achernar at 12:11 am (utc) on April 8, 2008]

Asia_Expat




msg:3697558
 6:06 pm on Jul 13, 2008 (gmt 0)

Just a heads up... these same bots are heavily active on my server again, so I've added them to the firewall once more (my firewall drops listed IP's after 30 days)

82.99.30.0/25

idiotgirl




msg:3697778
 6:51 am on Jul 14, 2008 (gmt 0)

They've been pounding me for weeks and filling up my error logs because I'd blocked their IP range via .htaccess. I firewalled them just yesterday at 82.99.30.0/25 and now my error logs are (almost) minty fresh. This range is relentless until you kick them to the curb.

keyplyr




msg:3697786
 7:00 am on Jul 14, 2008 (gmt 0)

They're a real pest alright. Just how they see this as acceptable behavior blatantly posting it on their webmaster info page is beyond me.

Asia_Expat




msg:3698882
 2:36 pm on Jul 15, 2008 (gmt 0)

Is there anywhere it would be appropriate to report this company, their activities and this range of IP's?... because it's nothing short of abuse.

incrediBILL




msg:3699145
 7:49 pm on Jul 15, 2008 (gmt 0)

Technically they've done nothing too horrible, but if you feel abused just block them and it's over.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved