Forum Moderators: open
I have just been going through my server logs and noticed these UA's:
63.155.196.249 - Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; AT&T CSM6.0)
64.156.198.78 - Mozilla/5.0 (X11; Linux i686; en-US; rv:1.0rc5; OBJR)
213.121.69.199 - Mozilla/4.0 (compatible; MSIE 5.5; Windows 95; sniffout_or_w9x)
64.0.99.201 - Mozilla/4.0 (compatible; MSIE 5.01; Windows 98; BROADPAGE; NetCaptor 6.5.0)
62.252.64.6 - IE 4 Win XP
62.251.22.163 - Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.1) Gecko/20020826
64.246.44.19 - lwp-trivial/1.35
64.246.44.19 - PHP/4.2.1
203.88.129.166 - DA 4.0
202.188.200.186 - contype
12.252.45.24 - Mozilla/9.9
213.122.107.212 - Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Circle0701)
I don't recognise any of them and all of them have either made too many requests in a short time, read the robots.txt files and totally ignored it, attempted to break into password protected areas or have done nothing wrong (I am just curious!). I have traced the IP's but most are commercial companies (AT&T, etc.) Some others I have already researched using this forum (this list was twice as long).
One I have traced and have banned (in case some of you haven't heard of it yet)
61.6.159.128 - Mozilla/4.0 (compatible; MSIE 6.0; Win32 <a href=\"http://www.zylox.com/ua.asp\">Internet Research Software</a> )
I also seem to be getting a lot of hits from FrontPage, is there any real way to block it using htaccess?
I can't block using the IP address because they are coming from several different addresses.
Thanks
ratman
202.133.166.113 - Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; NetCaptor 7.0.1)
It made roughly one hit a second for exactly an hour. Looking at the web site NetCaptor looks like just an enhanced browser but it's obviously been automated to try and do something, probably download the site.
This is not the first time I've had this one attack me so I'm off to block it.
ratman
How is it harmful? The URL goes to some Korean language page.
it make request like a regular user to me. why do you ban it?
Short answer - I don't remember.
Longer answer - If it's on my ban list, then it misbehaved at some time in the past.
Regular users use Internet Explorer, Netscape Navigator, Opera, Konquerer, and a few others. My sites have been abused by many bogus User-agents, so I ban a lot of them. I provide free information only, and have a limited budget for server bandwidth. If the User-agent does not indicate a search engine robot or a human using a "regular" browser, then I don't feel at all bad about turning it away. My sites, my rules. YMMV. :)
Jim
64.90.37.74 - - [02/Oct/2002:11:42:54 +0200] "GET / HTTP/1.0" 200 1672 www.anotherdomainiadministrate.de "-" "PHP/4.2.2" "-"
rDNS was omitted here, we can all do it ourselves...
dave, i'd suggest rather "^PHP" or "PHP" instead of the whole UA (at least I did this, not waiting for variants).
--jan
FAST-WebCrawler/3.3 (crawler@fast.no; [fast.no...]
The bad ones seem to be the ones that start with PHP like PHP 4.0.6 and PHP 4.1.2 and the others you've mentioned.
Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; [stupid comment])
can be generated by setting a registry key, this can by done by any software your visitor has installed.
Details at:
[winguides.com...]
In any case I was really interested in the thread here about using htaccess to do a browser and referer check. I attempted to use the information in this thread:
# Requests with blank referer and bogus UA (contains Mozilla/x.xx only)
RewriteCond %{HTTP_REFERER} ^$
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[0-9]\.[0-9]{1,2}$
RewriteRule!^403i?\.html$ - [F,L]
But it did not seem to work. I would not get an internal server error - but I wasn't sure that it was blocking incorrect agents. I'm not sure if the problem is that I had to modify a couple of things since is was not the last rule. Mine looks like this:
RewriteCond %{HTTP_REFERER} ^$
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[0-9]\.[0-9]{1,2}$
RewriteRule!^http://[^/.]\.MyURLisHere.com.* - [F]
RewriteCond %{REMOTE_ADDR} ^0.0.0.0 [OR]
RewriteCond %{HTTP_USER_AGENT} ^BadBots [OR]
RewriteCond %{HTTP_USER_AGENT} ^Advanced\ Email\ Extractor
RewriteRule!^http://[^/.]\.MyURLisHere.com.* - [F,L]
It blocks the user agents and the IP's I don't want on my wife's site, but I'm not sure the referer part is working. For example, I received this in my log:
Host: 66.196.97.126 Url: /es/index2.html Http Code : 304
Date: Nov 18 05:29:09 Http Version: HTTP/1.0 Size in Bytes: -
Referer: - Agent: Mozilla/5.0 (Slurp/cat; slurp@inktomi.com; [inktomi.com...]
This is a bot that I do want visiting my wife's site, but actually the code in the htaccess file should have given it a 403. Right? Have a made a big error in the code? I get no server error and it does block the bad agents and bad IP's (I didn't note all of the agents or IP's as they are a typical agent - bot block list for htaccess.)
Pardon my lack of skill in this area. I have learned a lot by reading the postings. I just never knew how much junk is out there until my wife changed hosting firms!
Thanks again for any assistance or advice. If I can be sure that the browser - referer check is working I could take about 90 percent of the IP's out of my list.
Welcome to WebmasterWorld!
The rule you have blocks User-agents which contain only Mozilla/4.0 (for example) and nothing else. It is working the way you want it to. You don't want to block Inktomi or legitimate Mozilla-based browsers, just the malicious robots that spoof Mozilla User-agents.
In other words, you want to block "Mozilla/4.0" or "Mozilla/3.01", but not "Mozilla/3.01 (compatible; ..."
RewriteCond %{REMOTE_ADDR} ^0.0.0.0 [OR]
RewriteCond %{HTTP_USER_AGENT} ^BadBots [OR]
RewriteCond %{HTTP_USER_AGENT} ^Advanced\ Email\ Extractor
RewriteRule !^http://[^/.]\.MyURLisHere.com.* - [F,L]
Several problems here. First, you must escape all periods in IP addresses and all other "reserved characters" in your patterns by prefixing a "\" like so:
RewriteCond %{REMOTE_ADDR} ^0\.0\.0\.0 [OR] RewriteRule !^error_page.html$ - [F] I have used:
RewriteRule !^403i?\.html$ - [F,L] RewriteRule !^(403i?\.html¦robots\.txt)$ - [F,L] If you don't have a custom error document, then you can just use:
RewriteRule .* - [F] BTW, I've found that the [L] flag is redundant when combined with the [F] or [G] flags.
I strongly recommend that you read the Apache mod_rewrite documentation [httpd.apache.org], and understand it well. Mod_rewrite is very powerful, and you can get into trouble with unintended consequences for your site by making a very small mistake in mod-rewrite. :o
Also, here is a link to a short guide to the use of regular expressions [etext.lib.virginia.edu] such as those used in for rewrite rules.
HTH,
Jim
Thanks for the help!
I read the information in your links and they were very helpful but I'm still just learning this!
Lately I've been consumed with blocking something called Netcraft Survey from my wife's site. It uses three (or more) IP's with totally different registrations in the USA and in Europe (UK). They say they "monitor" sites but they also provide a lot of information about the server operating systems. I've noticed a lot of hacking and other abuses on my wife's site right after I see logs with Netcraft on them. I think people go there to see what a site is using then try to use various things on your site.
The bots/agents come in under the following IP's regestered to Netcraft, Level3, or a company Energis Network Engineering:
195.92.95.16
195.92.95.18
213.254.184
64.156.198.85
65.56.235.111
Sometimes to:
216.205.150.91
Interliant
And the entry, on the face may look like:
Host: 64.156.198.85 Url: / Http Code : 403
Date: Nov 18 05:54:17 Http Version: HTTP/1.1 Size in Bytes: -
Referer: - Agent: Mozilla/5.0 (X11; Linux i686; en-US; rv:1.0rc5; OBJR)
When I look at the complete log, there is Netcraft - Netcraft Survey - Netcraft "get server uptime" - etc. There were hundreds of these hits each month from Netcraft. When I would send e-mails to their company - no reply, in fact my wife received "deleted without reading" return receipts.
The agent above received a 403 since the IP was blocked, but I removed several of the IP's from my blocked list to "test" if the rewrite structure will send a visit like the above to 403 without blocking the IP number.
I think I understand what needs to be done. I would like to send them to [F] rather than seeing one of her custom 403 pages as I think they send the visitor back to the index.
I guess (and I am a "bonehead" at this) it needs to be something like since the "." is "Quoting Special Characters":
RewriteCond %{REMOTE_ADDR} ^0\.0\.0\.0$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^BadBots$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^Advanced\ Email\ Extractor$
RewriteRule .* - [F]
I really appreciate the links and your help. I, like one of the first posters, get hundreds to thousands of hits from agents like that above and they are not there to give my wife business!
Thanks again and have a great weekend!
Host: 208.51.0.74 Url: /en/education.html Http Code : 200
Date: Nov 18 10:55:51 Http Version: HTTP/1.0 Size in Bytes: 15254
Referer: - Agent: Mozilla/2.0 (compatible; T-H-U-N-D-E-R-S-T-O-N-E)
Back to the drawing board!
In case I wasn't clear, you do not want to block real Mozilla-based user agents such as Thunderstone. If you do, no-one will be able to visit your site.
Thundertstone is a search engine-type service, and doesn't need to be blocked. A user-agent containing only "Mozilla/(number).(number)" is usually a spoof and does need to be blocked. The thunderstone request you posted is perfectly OK.
If you want to, post your entire rewrite section, and we can take a look at it and fix it. You refer to rewrite rules which are not in your post, so I'm lost...
Jim
I think I understand what it does now. So if the agent would be something like:
Agent: Mozilla/3.0
Then the above would go to 403.
I appreciate the information on that log entry. I had never seen that bot before. Sorry about that. Like I say, I'm new at this. Perhaps this is a better example that just came through:
Host: 24.71.13.90 Url: / Http Code : 200
Date: Nov 18 14:20:20 Http Version: HTTP/1.0 Size in Bytes: 0
Referer: - Agent: Mozilla/3.0 (compatible)
It has no "-" referer and took 0 Bytes. So when I see an entry like this I wonder what it is for. On most of the browsers I'll see something like (compatible; something; something; something). Not on ones like the above...
But everything must be working fine. No internal server errors, and it blocks the bad bots, agents and designated IP's. It works due to the information I received from this board. Many thanks for the information on this site and the assistance you bring to people like me!
User agent "Mozilla/3.0 (compatible)" is probably a caching proxy - a cache used by an ISP to reduce the traffic they must handle to and from the network. It is important that you allow these accesses, but still block the ones that don't even have "compatible" on them. I also see a lot of "Mozilla/3.01 (compatible)" requests, and they are usually just a user coming in through his ISP's caching proxy.
Don't be in too much of a hurry to block unknown user-agents. Instead, search through this forum for previous posts, and use the WebmasterWorld site search at the top of the screen to search for the user-agent.
I sent you stickymail about a useful tool for testing - see the top of your WebmasterWorld window.
Have fun!
Jim
# Requests with blank referer and bogus UA (contains Mozilla/x.xx only)
RewriteCond %{HTTP_REFERER} ^$
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[0-9]\.[0-9]{1,2}$
RewriteRule!^403i?\.html$ - [F,L]
Since there were more rules, I deleted the "L" The reason is that I had the following in my log tonight:
Host: 208.61.228.235 Url: / Http Code : 200
Date: Nov 19 10:48:09 Http Version: HTTP/1.0 Size in Bytes: 13001
Referer: - Agent: Mozilla
There was no number after the Mozilla - just Mozilla and it got a 200 not 403. I'll see if the cut and paste works.
Thanks again!
The pesky hits from the Netcraft IP's have been shunted by the web hosting company far upstream. They informed me that the server information the site posts is no longer correct (it shows a different server). They took this action after their abuse contact in the RIPE whois records turned out not to be valid and bounced. For two weeks now it shows the wrong server!
The other thing that was interesting was related to the .htaccess file. We found that there were several ways, thanks to Jim, to compress the file and make it more compact and effective.
We appreciate the help and Happy Holidays!
The pesky hits from the Netcraft IP's have been shunted by the web hosting company far upstream. They informed me that the server information the site posts is no longer correct (it shows a different server). They took this action after their abuse contact in the RIPE whois records turned out not to be valid and bounced. For two weeks now it shows the wrong server!
Good information, there.
Thanks for posting!
Jim
SetEnvIfNoCase User-Agent "^DA" kick_me_out
SetEnvIfNoCase User-Agent "^GetRight" kick_me_out
SetEnvIfNoCase User-Agent "^NetAnts" kick_me_out
SetEnvIfNoCase User-Agent "^NetPumper" kick_me_out
SetEnvIfNoCase User-Agent "^Scooter" kick_me_out
SetEnvIfNoCase User-Agent "^FlashGet" kick_me_out
SetEnvIfNoCase User-Agent "^SmartDownload" kick_me_out
SetEnvIfNoCase User-Agent "^NetPumper" kick_me_out
SetEnvIfNoCase User-Agent "^NSPlayer" kick_me_out [WinMediaPlayer]
SetEnvIfNoCase User-Agent "^RMA" kick_me_out [RealPlayer]
SetEnvIfNoCase User-Agent "MSIE 5\.00" kick_me_out
The last, MSIE 5.00 is a fake UI.... probably another download accelerator.