Forum Moderators: open
I have just been going through my server logs and noticed these UA's:
63.155.196.249 - Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; AT&T CSM6.0)
64.156.198.78 - Mozilla/5.0 (X11; Linux i686; en-US; rv:1.0rc5; OBJR)
213.121.69.199 - Mozilla/4.0 (compatible; MSIE 5.5; Windows 95; sniffout_or_w9x)
64.0.99.201 - Mozilla/4.0 (compatible; MSIE 5.01; Windows 98; BROADPAGE; NetCaptor 6.5.0)
62.252.64.6 - IE 4 Win XP
62.251.22.163 - Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.1) Gecko/20020826
64.246.44.19 - lwp-trivial/1.35
64.246.44.19 - PHP/4.2.1
203.88.129.166 - DA 4.0
202.188.200.186 - contype
12.252.45.24 - Mozilla/9.9
213.122.107.212 - Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Circle0701)
I don't recognise any of them and all of them have either made too many requests in a short time, read the robots.txt files and totally ignored it, attempted to break into password protected areas or have done nothing wrong (I am just curious!). I have traced the IP's but most are commercial companies (AT&T, etc.) Some others I have already researched using this forum (this list was twice as long).
One I have traced and have banned (in case some of you haven't heard of it yet)
61.6.159.128 - Mozilla/4.0 (compatible; MSIE 6.0; Win32 <a href=\"http://www.zylox.com/ua.asp\">Internet Research Software</a> )
I also seem to be getting a lot of hits from FrontPage, is there any real way to block it using htaccess?
I can't block using the IP address because they are coming from several different addresses.
Thanks
ratman
<64.156.198.78 - Mozilla/5.0 (X11; Linux i686; en-US; rv:1.0rc5; OBJR)>
[webmasterworld.com...]
Some of these user agents are not that mysterious:
63.155.196.249 - Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; AT&T CSM6.0)
Looks like a usual MS Internet Explorer running on Windows 98. AT&T just added its name to the user agent.
64.246.44.19 - lwp-trivial/1.35
Some cgi script I think. I am getting those hits as head. Maybe a link catalogue checks whether your site is reachable.
64.246.44.19 - PHP/4.2.1
Someone read a page on your server using a PHP function like file () and analysed it.
As I could see in my own log files there are many people out there messing with thier user agent info. Some puts ads and urls in the ua. I also saw an entry, where someone put a link - including the html code - in the ua. In analysed stats you only have to click on it. Some people call it log marketing.
NN
Welcome to WebmasterWorld!
block Frontpage using .htaccess?
RewriteCond %{HTTP_USER_AGENT} MS\ FrontPage [NC,OR]
-
Here's what I've done with some of your list entries:
BlockedIP-64.156.198.78---Mozilla/5.0 (X11; Linux i686; en-US; rv:1.0rc5; OBJR)
NewToMe---213.121.69.199--Mozilla/4.0 (compatible; MSIE 5.5; Windows 95; sniffout_or_w9x)
NewToMe---64.0.99.201-----Mozilla/4.0 (compatible; MSIE 5.01; Windows 98; BROADPAGE; NetCaptor 6.5.0)
BogusUA---62.252.64.6-----IE 4 Win XP
Netscape6-62.251.22.163---Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.1) Gecko/20020826
BlockedUA-64.246.44.19----lwp-trivial/1.35
NewToMe---64.246.44.19----PHP/4.2.1
BlockedUA-203.88.129.166--DA 4.0
NewToMe---202.188.200.186-contype
BogusUA---12.252.45.24----Mozilla/9.9
NewToMe---213.122.107.212-Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Circle0701)
Thanks for pruning your list - many of these are certainly new to me!
Jim
Namenick,
Thanks for the info. On that first one it's the CSM6.0 bit I don't get. This user downloaded 47 pages in just under a minute. I guess the UA must have been cloaked.
The zylox ua refers to a site downloading tool, I have banned it because it tried to download virtually every page from my site.
ratman
Thanks for the welcome and for the FrontPage block.
The problem is that I already have that line in my htaccess but someone got in the other day using "MSFrontPage/4.0"
I was thinking of changing it to:
RewriteCond %{HTTP_USER_AGENT} FrontPage [NC,OR]
Does any other robot/ua use the word frontpage in it's name?
--
I just found another one:
213.123.13.156 - EliteSys Entry/2.7 (Win32)
This is a brute force hacking tool designed to break password protected areas. The UA can be changed but blocking it should at least prevent this person from trying again.
ratman
I was thinking of changing it to:RewriteCond %{HTTP_USER_AGENT} FrontPage [NC,OR]
Does any other robot/ua use the word frontpage in it's name?
I doubt it - at least not a genuine UA.
Your RewriteCond above or
RewriteCond %{HTTP_USER_AGENT} MS.?Frontpage [NC,OR]
should work fine.
Entry\2.7 sounds like real bad news!
Jim
I had that exact same visitor - same IP, same UA. But, due to previous bad experiences with this access profile (bogus UA, blank referer), it hit a 403 block:
adsl-67-121-68-25.dsl.snfc21.pacbell.net - - [01/Oct/2002:01:58:32 -0400] "GET /robots.txt HTTP/1.0" 403 773 "-" "Mozilla/4.0"
# Requests with blank referer and bogus UA (contains Mozilla/x.xx only)
RewriteCond %{HTTP_REFERER} ^$
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[0-9]\.[0-9]{1,2}$
RewriteRule !^403i?\.html$ - [F,L]
This allows access only to custom 403 error and "help" pages, which were not subsequently fetched.
Hey PacBell user, wanna spider my site? Give me a valid User-agent string with valid contact info!
(Sorry, but the last time I saw requests with this profile, it ripped through the site too fast, and requested just about everything, disallowed or not.)
I'm keeping an eye on this IP address, in case it tries something else - good or bad.
Jim
The reason I ask is because the "Mozilla/9.9" UA from my first post, which Jim identified as bogus, is visiting my site regularly. It is not doing anything specifically wrong (except looking for the thankyou pages), but I guess there is no legit reason you would want to send a bogus UA unless you are up to no good.
--
I have been having literally thousands of hits from users with a UA of
Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)
After several hours of searching I discovered a tool named IntelliTamper, which is a site scanner similar to Black Widow. I downloaded it and ran it on my own server and it definitely leaves the above UA reference regardless of which OS you are using. I can’t see any way to change the UA so unless there are any legit robots/ua’s that leave this signature, it’s time to ban it.
ratman
The rewrite rules I posted above block this bogus UA by user-agent - It's a catch-all, since valid Mozilla-based-browser user-agents always contain more than just "Mozilla/4.0" for example. IE appends (compatible; MSIE something), and Netscape appends the language and version. Proxies will come in with a UA of "Mozilla/3.01" or similar, but then, most or all of them append "(compatible;)".
On the IntelliTamper subject, have you found any other agents that use "DigExt" in the UA string? Other than "DigExt", this UA string looks perfectly legitimate, and my impression after examining several sessions on my site, was that the behaviour looked human. I could easily be wrong - From your posts, it seems that you have a lot more bad-bot traffic than many who post here, based on the number of UAs you've reported that are new to me and others. So, I'm interested in the details of what you found on DigExt.
Thanks,
Jim
my ever-growing htaccess
Yeah, depressing, isn't it? I wonder how big they'll be in 10 years... Probably 10 times bigger than any page on our sites! :o
All I can say is that since I started filtering these unwelcome accesses and put our e-mail behind forms, spam e-mail to the site has decreased to 5% of what it used to be. So I carry on, and try to make my .htaccess more efficient rather than just bigger. The goal of minimizing junk mail makes it easy to allocate time to my blocking activities - I just work on .htaccess for about the same amount of time every day as it takes to sort through and delete the junk mail in the sites' "bulk mail" inboxes. Without a tangible metric, I fear I could easily become compulsive about this! Luckily, I can share the blocking rules across several sites, too.
Jim
Also, I myself use a bogus user agent. I use the Junkbuster filtering proxy and it leaves a false user agent for most of the sites I visit. It's a privacy concern. Lately I've changed my configuration to only allow sites with permission to set cookies to see my true UA. I don't have a problem with people using false UAs unless they are trying to hide behind it to do something ugly.
On the site I'm most concerned about, 99.999% of my human visitors don't know a user-agent from a secret agent. They come for what the site is intended to provide: Information, coimmunication, and organization. All other visitors are secondary to the goals of the site. So, I block bogus UAs unmercifully if they cause any problems - poking around looking for exploits, or even just using up bandwidth (which is paid for by the legitimate users - members of the organization).
You're welcome to visit my site using Opera cloaked as IE if you like - Just don't come try to come in with HTTrack or Indy Library or some hacked-up version of a legit UA. I block these because I can. If you use a legit UA and don't cause trouble, fine. If you cause trouble, your IP gets added to the list. If you change IPs, then the scripts will get you before you steal too much bandwidth.
I don't understand the privacy issue with user-agent. Why not set it to "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90;), or some other common-as-dirt UA and be done with it? (I'm not trying to give you a hard time here - I guess I really don't understand the issue.)
Jim
Thanks for the Mozilla rewrite...
BTW, I had to escape the "/" like this:
^Mozilla\/[0-9]\.[0-9]$
but then I am using it in a called up text file, not in httpd.conf...
YMMV.
Also, had a question... what is the:
{1,2}in your expression?
BROADPAGE NetCaptor are browser enhansors. They- by themselves- look harmless.
I get a TON of hits with the DigExt in it- maybe 10k so far today... here are a couple:
Mozilla/4.0 (compatible; MSIE 5.0; AOL 4.0; Windows 98; DigExt)
Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt; YComp 5.0.2.4)
Just spot checking, looks legit...
dave
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[0-9]\.[0-9]{1,2}$
Means match "Mozilla/" followed by one digit 0-9, a period, and then one to two digits 0-9 (i.e. 0-99).
The {} brackets delimit an inclusive character-count range {low,high}. You can also leave the high range out, i.e. {2,}, which would match 2 characters or more.
I've never had to escape "/" forward slashes, but I'm doing rewrites in a per-directory (.htaccess) context.
Jim
Thanks- I had never seen that before.
Strange, but I did have to escape it! From error log:
[Tue Oct 1 12:52:44 2002] [error] /xxxx/detailed.cgi: syntax error at (eval 82) line 77, near "/^Mozilla/["
Backslash found where operator expected at (eval 83) line 77, near "]\"
(Missing operator before \?)
Scalar found where operator expected at (eval 83) line 77, near "]$/"
(Missing operator before $/?)
Bareword found where operator expected at (eval 83) line 77, near "$/i"
(Missing operator before i?)
There it is!
dave
Thanks for the explanation, I am relatively new to all this rewrite stuff and am just getting to grips with the terminology used.
I haven't been able to find any other agent using the "DigExt" UA so far but I am still looking and will post the details if and when I find any others. They looked human to me at first and when I ran the program on my site it looked human as well. The program seems to just get the information about the file (i.e. size, type, etc.), so does not leave any real tell-tale signs that it is automated. There may well be some browser add-ons that use this UA as well so I have avoided blocking it until I can find out more info. Take a look at IntelliTamper and see what results you get.
I have had major headaches trying to find out info on some of these bots. Thanks mainly to this forum (I was a lurker for a while), I have identified and blocked quite a few. I should point out that some of these bots date back to Feb 2002. Some have only appeared once, grabbed what they wanted and left but others seem to be used by various regular visitors.
Some of the UA's posted on this forum (including the one posted by bull) I have never heard of. I think the main difference with my site is that I get a lot of rogue users looking to break in or find the download pages whereas other sites seem to get a lot of site downloaders.
Sorry Finder but I totally agree with Jim on the privacy issue. If a user does not intend to do anything wrong they have no reason to hide their UA. If you think it is necessary to cloak your real UA you should try and stick to a conventional cloak (such as Jim's suggestions). As the issue of site security and bandwidth catch on those who insist on using a strange UA may find that they are getting blocked from more and more sites.
ratman
So before we put Finder on the defensive, I hope to find out what the UA privacy issue is... I am blocking most UAs to try to preserve the e-mail privacy of our organization's officers and volunteers, so I'm actually a privacy supporter. So, please take my question as sincere!
Jim
The default Junkbuster setting is to use an innocuous UA, although the one they use is kind of outdated by now. Let's see... "Mozilla (Netscape) 3.01 Gold" or something similar.
Personally I no longer have a problem with UA now that I know more about how logging works. In fact, it's kind of funny how much I enjoy analyzing UAs and referrer strings from my site visitors when both those fields are misidentified when I leave my own trail in other people's logs.
How annoying it would be if all my visitors blocked me from seeing where my traffic comes from! ;)
I guess the term privacy issue is a little too strong but having bogus or unidentified UA's accessing my site has caused me major headaches. I didn't mean it to sound as if I block anything I don't recognise, I just meant that every cloaked UA that has visited my site has tried to do something it was not meant to do.
I apologise to you and Finder if I offended you in any way, it was not intentional.
------------
Just found a nice explanation about the DigExt UA. I cannot post links here (can I?) so I cannot let you know where the detailed explanation is. If you do a search on Google for "DigExt" you will find it.
Someone has already suggested the source of this UA somewhere on this forum but I cannot find it right now.
Apparently the UA is left by the offline browsing mode of recent versions of IE. It is only left when someone adds the site to their favorites and chooses the "Make Available Offline" option.
So this probably accounts for more than 90% of the hits everyone is seeing.
So the question is, is there any way of putting up a speed or download limit?
ratman
It is only left when someone adds the site to their favorites and chooses the "Make Available Offline" option.
Wow, that's fantastic. That means lots of people have bookmarked my site. I wonder if it is disproportionately dial-up users doing this? I get a lot of traffic from AOLers. Very interesting...
port144.ds1-hhl.adsl.cybercity.dk - - [01/Oct/2002:23:20:37 +0200] "GET / HTTP/1.1" 200 1425 www.mysite.net "-" "IE 5.5 Compatible Browser" "-"
port144.ds1-hhl.adsl.cybercity.dk - - [01/Oct/2002:23:20:39 +0200] "GET /dir1/dir2/ HTTP/1.1" 200 9416 www.mysite.net "-" "IE 5.5 Compatible Browser" "-"
after 2 seconds grabs /dir1/dir2/ which is ODP listed, nothing more. strange.
I hereby declare Oct 1 "Day of the unknown UA".
--jan
I didn't take offense - I was just trying to head off a potential problem...
Posting URL's to informational pages is (AFAIK) OK here on WebmasterWorld. Posting links to your own site, the site of a competitor, or posting keywords that will lead to your site or that of a competitor, is a no-no.
The idea is to prevent anybody from using WebmasterWorld as an advertising venue, to prevent requests for site reviews, and to prevent situations where a member says, "Hey, go look at this spammy site - Why does Gaggle let them get away with that?" At first blush, none of these seem too bad, but think about the situation here if there were tens of thousands of those posts per day. WebmasterWorld would quickly become unusable. So, the powers that be prefer that we keep discussions at the abstract and theoretical level.
Posting the entire content of a Web page, or posting a private e-mail message without permission of the other party is also forbidden in order to prevent copyright/legal problems.
If you want to post a link to a specific help page with which you have no business relationship, I think that would be OK. I have posted links to the Apache Server mod-rewrite documentation at least 50 times. Just like e-mail, a link is often better than trying to copy or paraphrase a long document. If you have a question about appropriate use, re-read the Terms of Service. If that doesn't answer your question, stickymail a moderator and ask!
I guess this is way off-topic, huh?
Jim
I guess this is way off-topic, huh?
Answered my question though! thanks
ratman
oh yes, the link is DigExt is hammering my site [geocrawler.com]