homepage Welcome to WebmasterWorld Guest from 54.196.63.93
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Munax bots cloaking themselves and causing high server load
Asia_Expat

5+ Year Member



 
Msg#: 3616493 posted 7:04 pm on Apr 1, 2008 (gmt 0)

I recently noticed a peaking in my server load. At the same time, I noticed on my forum a certain range of IP addresses continually crawling over and over, they seemed particularly interested in the 'register' page.

I did some digging around about this IP range and traced them to a Swedish company called Munax. They offer the following statement about their spidering activities, in which they openly claim to cloak their spiders to appear as a regular human being...
[munax.com...]

They also claim to respect the robots.txt file but I can assure you that claim is false. My forum is carefully managed in my robots file and these IP's are totally ignoring it. Further, they are crawling so aggressively and with such repetition, it's bringing my server to it's knees at times and also making it look like there are many more visitors than there actually is.

The IP range, as far as I can tell, is...
82.99.30.0 - 82.99.30.127

What can I do about this? If I place the following in my htaccess file in the forum directory, will it work and will I potentially be doing any harm?...

<Files *>
order allow,deny
allow from all
deny from 82.99.30.0/127
</Files>

 

Staffa

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3616493 posted 6:53 am on Apr 2, 2008 (gmt 0)

I saw them yesterday too. Grabbed index file and no robots.txt, got promptly banned ;o)

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3616493 posted 7:07 am on Apr 2, 2008 (gmt 0)

I've been watching them for quite some time and this statement from their site is a real winner:

Our crawler does not have a "name", yet. Instead it announces itself to be a standard web browser, a "Mozilla 4.0" kind-of-browser compatible with the browser Microsoft Internet Explorer 6.0, running on the Windows NT 5.1 operating system.

They could actually include their path in the MSIE user agent, others do that, nothing new there.

Sorry, if you can't identify yourself properly you can't play in my sandbox.

End of story.

Asia_Expat

5+ Year Member



 
Msg#: 3616493 posted 8:46 am on Apr 2, 2008 (gmt 0)

Is my above code correct to ban them... I just add that to my current htaccess file, yes?

thetrasher

5+ Year Member



 
Msg#: 3616493 posted 11:29 am on Apr 2, 2008 (gmt 0)

deny from 82.99.30.0/127
deny from 82.99.30.0/25
Hobbs

WebmasterWorld Senior Member hobbs us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3616493 posted 4:32 pm on Apr 2, 2008 (gmt 0)

Unsig every IP between 82.99.30.2 to 82.99.30.64
570 hits in 2 days
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
bye bye 82.99.30.0/25

[added: They also do not follow robots.txt and fell into my bot trap]

[edited by: Hobbs at 4:39 pm (utc) on April 2, 2008]

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3616493 posted 7:13 pm on Apr 2, 2008 (gmt 0)

for those that use rewrites.

RewriteCond %{REMOTE_ADDR} ^82\.99\.30\.(9[6-9]¦1[01][0-9]¦12[0-7])$ [OR]

Asia_Expat

5+ Year Member



 
Msg#: 3616493 posted 9:57 am on Apr 4, 2008 (gmt 0)

I successfully blocked them and my forum shows a halving of online users at any given time :-D LOL

Achernar

5+ Year Member



 
Msg#: 3616493 posted 9:45 pm on Apr 4, 2008 (gmt 0)

The IP range, as far as I can tell, is...
82.99.30.0 - 82.99.30.127

What can I do about this?

Funny. They also crawled a couple of sites I manage, and were always caught in the bot-traps.

This IP range is now definitively banned at the firewall level for all the servers I manage:
82.99.30.0/24

Megaclinium

5+ Year Member



 
Msg#: 3616493 posted 8:40 pm on Apr 7, 2008 (gmt 0)

eek! I just UN-banned them afer a few months.
I had re-extracted weblogs for just their IPs and shows it DID grab the robots.txt which I hadn't thought it did, when I summed stats from a combined months file.

I do see them as referrer occasionally in traffic. I didn't really check whether it obeyed robots.txt.

Achernar

5+ Year Member



 
Msg#: 3616493 posted 12:07 am on Apr 8, 2008 (gmt 0)

They don't obey robots.txt, else they wouldn't get caught in bot-traps.

[edited by: Achernar at 12:11 am (utc) on April 8, 2008]

Asia_Expat

5+ Year Member



 
Msg#: 3616493 posted 6:06 pm on Jul 13, 2008 (gmt 0)

Just a heads up... these same bots are heavily active on my server again, so I've added them to the firewall once more (my firewall drops listed IP's after 30 days)

82.99.30.0/25

idiotgirl

10+ Year Member



 
Msg#: 3616493 posted 6:51 am on Jul 14, 2008 (gmt 0)

They've been pounding me for weeks and filling up my error logs because I'd blocked their IP range via .htaccess. I firewalled them just yesterday at 82.99.30.0/25 and now my error logs are (almost) minty fresh. This range is relentless until you kick them to the curb.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3616493 posted 7:00 am on Jul 14, 2008 (gmt 0)

They're a real pest alright. Just how they see this as acceptable behavior blatantly posting it on their webmaster info page is beyond me.

Asia_Expat

5+ Year Member



 
Msg#: 3616493 posted 2:36 pm on Jul 15, 2008 (gmt 0)

Is there anywhere it would be appropriate to report this company, their activities and this range of IP's?... because it's nothing short of abuse.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3616493 posted 7:49 pm on Jul 15, 2008 (gmt 0)

Technically they've done nothing too horrible, but if you feel abused just block them and it's over.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved