Welcome to WebmasterWorld Guest from 188.8.131.52
I did some digging around about this IP range and traced them to a Swedish company called Munax. They offer the following statement about their spidering activities, in which they openly claim to cloak their spiders to appear as a regular human being...
They also claim to respect the robots.txt file but I can assure you that claim is false. My forum is carefully managed in my robots file and these IP's are totally ignoring it. Further, they are crawling so aggressively and with such repetition, it's bringing my server to it's knees at times and also making it look like there are many more visitors than there actually is.
The IP range, as far as I can tell, is...
184.108.40.206 - 220.127.116.11
What can I do about this? If I place the following in my htaccess file in the forum directory, will it work and will I potentially be doing any harm?...
allow from all
deny from 18.104.22.168/127
Our crawler does not have a "name", yet. Instead it announces itself to be a standard web browser, a "Mozilla 4.0" kind-of-browser compatible with the browser Microsoft Internet Explorer 6.0, running on the Windows NT 5.1 operating system.
They could actually include their path in the MSIE user agent, others do that, nothing new there.
Sorry, if you can't identify yourself properly you can't play in my sandbox.
End of story.
I do see them as referrer occasionally in traffic. I didn't really check whether it obeyed robots.txt.