Forum Moderators: open

Message Too Old, No Replies

Mysterious User Agents

         

ratman

10:46 pm on Sep 30, 2002 (gmt 0)

10+ Year Member



Hi All,

I have just been going through my server logs and noticed these UA's:

63.155.196.249 - Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; AT&T CSM6.0)
64.156.198.78 - Mozilla/5.0 (X11; Linux i686; en-US; rv:1.0rc5; OBJR)
213.121.69.199 - Mozilla/4.0 (compatible; MSIE 5.5; Windows 95; sniffout_or_w9x)
64.0.99.201 - Mozilla/4.0 (compatible; MSIE 5.01; Windows 98; BROADPAGE; NetCaptor 6.5.0)
62.252.64.6 - IE 4 Win XP
62.251.22.163 - Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.1) Gecko/20020826
64.246.44.19 - lwp-trivial/1.35
64.246.44.19 - PHP/4.2.1
203.88.129.166 - DA 4.0
202.188.200.186 - contype
12.252.45.24 - Mozilla/9.9
213.122.107.212 - Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Circle0701)

I don't recognise any of them and all of them have either made too many requests in a short time, read the robots.txt files and totally ignored it, attempted to break into password protected areas or have done nothing wrong (I am just curious!). I have traced the IP's but most are commercial companies (AT&T, etc.) Some others I have already researched using this forum (this list was twice as long).

One I have traced and have banned (in case some of you haven't heard of it yet)

61.6.159.128 - Mozilla/4.0 (compatible; MSIE 6.0; Win32 <a href=\"http://www.zylox.com/ua.asp\">Internet Research Software</a> )

I also seem to be getting a lot of hits from FrontPage, is there any real way to block it using htaccess?
I can't block using the IP address because they are coming from several different addresses.

Thanks
ratman

PandaM

4:05 am on Oct 2, 2002 (gmt 0)

10+ Year Member



just found this in my log
211.32.193.10 - "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; \xbe\xc6\xc6\xae\xb9\xcc\xb5\xf0\xbe\xee (artmedia.org))"

jdMorgan

4:13 am on Oct 2, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Seen it, killed it! :)

RewriteCond %{HTTP_USER_AGENT} x[a-f0-9]{2}.x[a-f0-9]{2}.x[a-f0-9]{2} [NC]
RewriteRule .* - [F,L]

Jim

Finder

4:33 am on Oct 2, 2002 (gmt 0)

10+ Year Member



How is it harmful? The URL goes to some Korean language page.

PandaM

6:04 am on Oct 2, 2002 (gmt 0)

10+ Year Member



Jim,
it make request like a regular user to me. why do you ban it?

ratman

9:38 am on Oct 2, 2002 (gmt 0)

10+ Year Member



Got over 2600 hits from this one at 3am this morning.

202.133.166.113 - Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; NetCaptor 7.0.1)

It made roughly one hit a second for exactly an hour. Looking at the web site NetCaptor looks like just an enhanced browser but it's obviously been automated to try and do something, probably download the site.

This is not the first time I've had this one attack me so I'm off to block it.

ratman

bull

3:16 pm on Oct 2, 2002 (gmt 0)

10+ Year Member



PHP/4.2.2 was just here too. blocking anything containing PHP now, dont see any real reason for getting a file via PHP from my domains.

jdMorgan

4:36 pm on Oct 2, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How is it harmful? The URL goes to some Korean language page.

it make request like a regular user to me. why do you ban it?

Short answer - I don't remember.

Longer answer - If it's on my ban list, then it misbehaved at some time in the past.

Regular users use Internet Explorer, Netscape Navigator, Opera, Konquerer, and a few others. My sites have been abused by many bogus User-agents, so I ban a lot of them. I provide free information only, and have a limited budget for server bandwidth. If the User-agent does not indicate a search engine robot or a human using a "regular" browser, then I don't feel at all bad about turning it away. My sites, my rules. YMMV. :)

Jim

ratman

6:02 pm on Oct 2, 2002 (gmt 0)

10+ Year Member



PHP/4.2.2 was just here too

Was it a similar IP address (64.246.44.19)?

It came back yesterday so I am also going to block anything containing PHP.

ratman

carfac

6:39 pm on Oct 2, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is the UA started with PHP, or is that part of the UA? Please post the WHOLE ua, so I know whether to use "PHP" or "^PHP"

dave

(I am proactive, I do not wait to see it here, first!)

ratman

6:51 pm on Oct 2, 2002 (gmt 0)

10+ Year Member



Is the UA started with PHP, or is that part of the UA?

The whole UA is

"PHP/4.2.1"

ratman

carfac

7:22 pm on Oct 2, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks ratman! That one is gone now!

dave

bull

9:06 pm on Oct 2, 2002 (gmt 0)

10+ Year Member



ratman,

64.90.37.74 - - [02/Oct/2002:11:42:54 +0200] "GET / HTTP/1.0" 200 1672 www.anotherdomainiadministrate.de "-" "PHP/4.2.2" "-"

rDNS was omitted here, we can all do it ourselves...

dave, i'd suggest rather "^PHP" or "PHP" instead of the whole UA (at least I did this, not waiting for variants).

--jan

ratman

9:25 pm on Oct 2, 2002 (gmt 0)

10+ Year Member



As far as I know there is no legit agent that contains PHP in it's name so I intend to block anything containing PHP.

The hits I have had were both from the same IP address and seemed to be looking for weaknesses in scripts because both hits were to the cgi-bin directory.

ratman

carfac

10:28 pm on Oct 2, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Bull:

Thanks! I went half-way... I blocked ^PHP\/4

(I have to escape a slash the way I do my blocks..., so it works out as: ^PHP/4)

dave

GaryK

10:39 pm on Oct 2, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I publish a browscap.ini file for IIS and a robots.txt file. In my table of what I call "known user agents" I have plenty of stuff with PHP in it that's perfectly legit so please use caution in how you do the blocking. For example, Fast usually has PHP in the URL:

FAST-WebCrawler/3.3 (crawler@fast.no; [fast.no...]

The bad ones seem to be the ones that start with PHP like PHP 4.0.6 and PHP 4.1.2 and the others you've mentioned.

ratman

11:18 pm on Oct 2, 2002 (gmt 0)

10+ Year Member



Fast usually has PHP in the URL:

Thanks GaryK, I didn't think of that.

I'll just block any UA that starts with PHP, at least until something else appears.

Thanks again
ratman

weesnich

1:22 pm on Oct 3, 2002 (gmt 0)

10+ Year Member



Comments in the MSIE-UA like

Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; [stupid comment])

can be generated by setting a registry key, this can by done by any software your visitor has installed.

Details at:
[winguides.com...]

guabito

5:46 pm on Nov 17, 2002 (gmt 0)

10+ Year Member



Hello. I'm new to this board and I've really appreciated all of the insights and information provided. Recently we changed web hosting companies and our new privider has a control panel with a lot of statistics. It appears that my wife's commercial website had most of her "hits" and bandwidth taken by agents to get e-mails or companies that monitor "uptime" - even though we did not ask for it. It appears to me that many of these "resources" are there to allow hackers to check the server operating system.

In any case I was really interested in the thread here about using htaccess to do a browser and referer check. I attempted to use the information in this thread:

# Requests with blank referer and bogus UA (contains Mozilla/x.xx only)
RewriteCond %{HTTP_REFERER} ^$
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[0-9]\.[0-9]{1,2}$
RewriteRule!^403i?\.html$ - [F,L]

But it did not seem to work. I would not get an internal server error - but I wasn't sure that it was blocking incorrect agents. I'm not sure if the problem is that I had to modify a couple of things since is was not the last rule. Mine looks like this:

RewriteCond %{HTTP_REFERER} ^$
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[0-9]\.[0-9]{1,2}$
RewriteRule!^http://[^/.]\.MyURLisHere.com.* - [F]

RewriteCond %{REMOTE_ADDR} ^0.0.0.0 [OR]
RewriteCond %{HTTP_USER_AGENT} ^BadBots [OR]
RewriteCond %{HTTP_USER_AGENT} ^Advanced\ Email\ Extractor
RewriteRule!^http://[^/.]\.MyURLisHere.com.* - [F,L]

It blocks the user agents and the IP's I don't want on my wife's site, but I'm not sure the referer part is working. For example, I received this in my log:

Host: 66.196.97.126 Url: /es/index2.html Http Code : 304
Date: Nov 18 05:29:09 Http Version: HTTP/1.0 Size in Bytes: -
Referer: - Agent: Mozilla/5.0 (Slurp/cat; slurp@inktomi.com; [inktomi.com...]

This is a bot that I do want visiting my wife's site, but actually the code in the htaccess file should have given it a 403. Right? Have a made a big error in the code? I get no server error and it does block the bad agents and bad IP's (I didn't note all of the agents or IP's as they are a typical agent - bot block list for htaccess.)

Pardon my lack of skill in this area. I have learned a lot by reading the postings. I just never knew how much junk is out there until my wife changed hosting firms!

Thanks again for any assistance or advice. If I can be sure that the browser - referer check is working I could take about 90 percent of the IP's out of my list.

jdMorgan

6:46 pm on Nov 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



guabito,

Welcome to WebmasterWorld!

The rule you have blocks User-agents which contain only Mozilla/4.0 (for example) and nothing else. It is working the way you want it to. You don't want to block Inktomi or legitimate Mozilla-based browsers, just the malicious robots that spoof Mozilla User-agents.

In other words, you want to block "Mozilla/4.0" or "Mozilla/3.01", but not "Mozilla/3.01 (compatible; ..."

RewriteCond %{REMOTE_ADDR} ^0.0.0.0 [OR]
RewriteCond %{HTTP_USER_AGENT} ^BadBots [OR]
RewriteCond %{HTTP_USER_AGENT} ^Advanced\ Email\ Extractor
RewriteRule !^http://[^/.]\.MyURLisHere.com.* - [F,L]

Several problems here. First, you must escape all periods in IP addresses and all other "reserved characters" in your patterns by prefixing a "\" like so:

RewriteCond %{REMOTE_ADDR} ^0\.0\.0\.0 [OR]

Second, you must not use a URL in the left side of the RewriteRule. You need to use a path (filename) only:
RewriteRule !^error_page.html$ - [F]

means, "for the user-agents above, respond to all requests except for my custom error page named "error_page.html" with a 403-Forbidden response code."

I have used:

RewriteRule !^403i?\.html$ - [F,L]

to allow all user-agents to fetch my custom error documents, named "/403.html" and "/403i.html". I now use:
RewriteRule !^(403i?\.html¦robots\.txt)$ - [F,L]

To give them a chance to fetch robots.txt as well.

If you don't have a custom error document, then you can just use:

RewriteRule .* - [F]

which means, "for the user-agents above, respond to all requests for any file with a 403-Forbidden response code."

BTW, I've found that the [L] flag is redundant when combined with the [F] or [G] flags.

I strongly recommend that you read the Apache mod_rewrite documentation [httpd.apache.org], and understand it well. Mod_rewrite is very powerful, and you can get into trouble with unintended consequences for your site by making a very small mistake in mod-rewrite. :o

Also, here is a link to a short guide to the use of regular expressions [etext.lib.virginia.edu] such as those used in for rewrite rules.

HTH,
Jim

guabito

10:43 pm on Nov 17, 2002 (gmt 0)

10+ Year Member



Jim,

Thanks for the help!

I read the information in your links and they were very helpful but I'm still just learning this!

Lately I've been consumed with blocking something called Netcraft Survey from my wife's site. It uses three (or more) IP's with totally different registrations in the USA and in Europe (UK). They say they "monitor" sites but they also provide a lot of information about the server operating systems. I've noticed a lot of hacking and other abuses on my wife's site right after I see logs with Netcraft on them. I think people go there to see what a site is using then try to use various things on your site.

The bots/agents come in under the following IP's regestered to Netcraft, Level3, or a company Energis Network Engineering:

195.92.95.16
195.92.95.18
213.254.184
64.156.198.85
65.56.235.111

Sometimes to:
216.205.150.91
Interliant

And the entry, on the face may look like:

Host: 64.156.198.85 Url: / Http Code : 403
Date: Nov 18 05:54:17 Http Version: HTTP/1.1 Size in Bytes: -
Referer: - Agent: Mozilla/5.0 (X11; Linux i686; en-US; rv:1.0rc5; OBJR)

When I look at the complete log, there is Netcraft - Netcraft Survey - Netcraft "get server uptime" - etc. There were hundreds of these hits each month from Netcraft. When I would send e-mails to their company - no reply, in fact my wife received "deleted without reading" return receipts.

The agent above received a 403 since the IP was blocked, but I removed several of the IP's from my blocked list to "test" if the rewrite structure will send a visit like the above to 403 without blocking the IP number.

I think I understand what needs to be done. I would like to send them to [F] rather than seeing one of her custom 403 pages as I think they send the visitor back to the index.

I guess (and I am a "bonehead" at this) it needs to be something like since the "." is "Quoting Special Characters":

RewriteCond %{REMOTE_ADDR} ^0\.0\.0\.0$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^BadBots$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^Advanced\ Email\ Extractor$
RewriteRule .* - [F]

I really appreciate the links and your help. I, like one of the first posters, get hundreds to thousands of hits from agents like that above and they are not there to give my wife business!

Thanks again and have a great weekend!

guabito

10:58 pm on Nov 17, 2002 (gmt 0)

10+ Year Member



Well, I guess I just got my answer that it isn't working... This was in my wife's log:

Host: 208.51.0.74 Url: /en/education.html Http Code : 200
Date: Nov 18 10:55:51 Http Version: HTTP/1.0 Size in Bytes: 15254
Referer: - Agent: Mozilla/2.0 (compatible; T-H-U-N-D-E-R-S-T-O-N-E)

Back to the drawing board!

jdMorgan

11:52 pm on Nov 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



guabito,

In case I wasn't clear, you do not want to block real Mozilla-based user agents such as Thunderstone. If you do, no-one will be able to visit your site.

Thundertstone is a search engine-type service, and doesn't need to be blocked. A user-agent containing only "Mozilla/(number).(number)" is usually a spoof and does need to be blocked. The thunderstone request you posted is perfectly OK.

If you want to, post your entire rewrite section, and we can take a look at it and fix it. You refer to rewrite rules which are not in your post, so I'm lost...

Jim

guabito

2:51 am on Nov 18, 2002 (gmt 0)

10+ Year Member



Thanks for the information Jim. My htaccess file may take up five pages of posting space. LOL! Just kidding!

I think I understand what it does now. So if the agent would be something like:

Agent: Mozilla/3.0

Then the above would go to 403.

I appreciate the information on that log entry. I had never seen that bot before. Sorry about that. Like I say, I'm new at this. Perhaps this is a better example that just came through:

Host: 24.71.13.90 Url: / Http Code : 200
Date: Nov 18 14:20:20 Http Version: HTTP/1.0 Size in Bytes: 0
Referer: - Agent: Mozilla/3.0 (compatible)

It has no "-" referer and took 0 Bytes. So when I see an entry like this I wonder what it is for. On most of the browsers I'll see something like (compatible; something; something; something). Not on ones like the above...

But everything must be working fine. No internal server errors, and it blocks the bad bots, agents and designated IP's. It works due to the information I received from this board. Many thanks for the information on this site and the assistance you bring to people like me!

jdMorgan

3:00 am on Nov 18, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



guabito,

User agent "Mozilla/3.0 (compatible)" is probably a caching proxy - a cache used by an ISP to reduce the traffic they must handle to and from the network. It is important that you allow these accesses, but still block the ones that don't even have "compatible" on them. I also see a lot of "Mozilla/3.01 (compatible)" requests, and they are usually just a user coming in through his ISP's caching proxy.

Don't be in too much of a hurry to block unknown user-agents. Instead, search through this forum for previous posts, and use the WebmasterWorld site search at the top of the screen to search for the user-agent.

I sent you stickymail about a useful tool for testing - see the top of your WebmasterWorld window.

Have fun!
Jim

guabito

2:29 am on Nov 19, 2002 (gmt 0)

10+ Year Member



Thanks Jim for your help. I need to do something... The way I had it set up was not working so I did a direct cut and paste of:

# Requests with blank referer and bogus UA (contains Mozilla/x.xx only)
RewriteCond %{HTTP_REFERER} ^$
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[0-9]\.[0-9]{1,2}$
RewriteRule!^403i?\.html$ - [F,L]

Since there were more rules, I deleted the "L" The reason is that I had the following in my log tonight:

Host: 208.61.228.235 Url: / Http Code : 200
Date: Nov 19 10:48:09 Http Version: HTTP/1.0 Size in Bytes: 13001
Referer: - Agent: Mozilla

There was no number after the Mozilla - just Mozilla and it got a 200 not 403. I'll see if the cut and paste works.

Thanks again!

jdMorgan

2:38 am on Nov 19, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



guabito,

The code you quote won't stop "Mozilla" because that UA contains no numbers.

This code will stop "Mozilla" with a blank referer:


RewriteCond %{HTTP_REFERER} ^$
RewriteCond %{HTTP_USER_AGENT} ^Mozilla$
RewriteRule !^403i?\.html$ - [F]

Jim

guabito

2:05 pm on Dec 3, 2002 (gmt 0)

10+ Year Member



I want to thank everyone and especially Jim for the great information I've received in writing my .htaccess file. I just wanted to put up some additional follow-up information.

The pesky hits from the Netcraft IP's have been shunted by the web hosting company far upstream. They informed me that the server information the site posts is no longer correct (it shows a different server). They took this action after their abuse contact in the RIPE whois records turned out not to be valid and bounced. For two weeks now it shows the wrong server!

The other thing that was interesting was related to the .htaccess file. We found that there were several ways, thanks to Jim, to compress the file and make it more compact and effective.

We appreciate the help and Happy Holidays!

jdMorgan

4:19 pm on Dec 3, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



guabito,

The pesky hits from the Netcraft IP's have been shunted by the web hosting company far upstream. They informed me that the server information the site posts is no longer correct (it shows a different server). They took this action after their abuse contact in the RIPE whois records turned out not to be valid and bounced. For two weeks now it shows the wrong server!

Good information, there.

Thanks for posting!
Jim

spinnercee

3:07 am on Dec 5, 2002 (gmt 0)

10+ Year Member



WOW.... Anyway, I wanted to revisit something from page one regarding blocking the FrontPage UA... You may want to be careful there as you may have a webmaster who links to you validating page links --- If you throw a 403, the link may get clipped (FrontPage should only be making HEAD requests, but Microsoft gets greedy sometimes) -- I agree with evaluating the context and the pattern before banning a whole UA --- it's safer to ban the IP first (that stops the boys), if they become persistent and they are the only one, then go ahead. The same goes for banning subnets -- you could be turning off many potential visitors just to stop one hack, and in a way, then they win.

spinnercee

3:16 am on Dec 5, 2002 (gmt 0)

10+ Year Member



my banned UAs --- I want to prevent mass lifting of my content on a site that provides software images... they are mostly download accelerators.

SetEnvIfNoCase User-Agent "^DA" kick_me_out
SetEnvIfNoCase User-Agent "^GetRight" kick_me_out
SetEnvIfNoCase User-Agent "^NetAnts" kick_me_out
SetEnvIfNoCase User-Agent "^NetPumper" kick_me_out
SetEnvIfNoCase User-Agent "^Scooter" kick_me_out
SetEnvIfNoCase User-Agent "^FlashGet" kick_me_out
SetEnvIfNoCase User-Agent "^SmartDownload" kick_me_out
SetEnvIfNoCase User-Agent "^NetPumper" kick_me_out
SetEnvIfNoCase User-Agent "^NSPlayer" kick_me_out [WinMediaPlayer]
SetEnvIfNoCase User-Agent "^RMA" kick_me_out [RealPlayer]
SetEnvIfNoCase User-Agent "MSIE 5\.00" kick_me_out

The last, MSIE 5.00 is a fake UI.... probably another download accelerator.

This 76 message thread spans 3 pages: 76