| 8:15 pm on May 31, 2008 (gmt 0)|
FWIW, On a slow day, I set a deny for the SV1 and removed it after a few hours.
The 1813 gets caught by some white listing in place.
I've currently a redirect in place.
In the wee hours of this morning, I saw more than a dozen conseuctive redirects which affected Jeeves requests and I'm only able to surmise that Jeeves is chasing was chasing these for some reason. Each of the redirects, resulted in individual (and successful) subsequent page requests frome Jeeves "over a longer tham sucessive short period".
I don't like the idea of the SV1's, however I'm simply not willing to spend the time sorting out good IP's from bad IP's and then add these IP's to a conditional deny.
As far as:
I've a deny on place for the Class A.
| 9:18 pm on May 31, 2008 (gmt 0)|
|58.172.195.nnn - - [29/May/2008:20:50:49 +1000] "GET /widget.htm HTTP/1.1" |
Is this a human, or is it bot pretending to come from Google?
I am concerned that I'm blocking genuine visitors.
That IP range is assigned to Australia; not a known haven for bot bad boys.
inetnum: 18.104.22.168 - 22.214.171.124
person: Telstra Internet Address Registry
address: Telstra Internet
address: Locked Bag 5744
address: ACT 2601
phone: +61 3 9815 5923
APNIC is the registry for that part of the world which includes some pretty benign countries like Australia, New Zealand and Japan along with others that are more suspect like Thailand, China and Korea.
Deny(ing) Class A chunks of APNIC addresses will block entire groups of English speaking, reasonably affluent and generally well behaved users from visiting your site.
| 10:11 pm on May 31, 2008 (gmt 0)|
I think we've established that "SV1" is some sort of security upgrade for IE6, and not an indication of badness. See [webmasterworld.com...] - post 3660201 by member "smallcompany"
It is possible that a bad guy might spoof IE6/SV1 though, so seeing "SV1" doesn't mean the user-agent is "good" or "bad."
However, if you see IE7 with SP1, that's a fake UA.
| 10:30 pm on May 31, 2008 (gmt 0)|
|Deny(ing) Class A chunks of APNIC addresses will block entire groups of English speaking, reasonably affluent and generally well behaved users from visiting your site. |
That is entirely my intention.
As it was my intention during 2002 when I spent three weeks (accumualted databases were not available at that time; before another chirps in how easy it is today) sorting out IP ranges to allow access from specific Oceanic visitors in the (144¦20¦21¦61) Class A's.
| 11:08 pm on May 31, 2008 (gmt 0)|
|Are all UAs ending in "SV1)" invalid? |
Definitely not - though scammers undoubtedly use it, same as any UA.
As Jim mentioned, our contributor smallcompany says he has it as his user-agent.
It is also used by the pre-AVG version of LinkScanner:
|Is this a human, or is it bot pretending to come from Google? |
I would say that particular example is a human who just searched on your keywords.
I would also say that giving such people a 403 is insanity.
|I am concerned that I'm blocking genuine visitors. |
The IE Blog says that SV1 stands for "Security Version 1" and indicates a minimum of XP SP2.
Most common IE UA strings have other tokens after the SV1, but not all of them.
If you download LinkScanner from CNET and install it you can see for yourself.
Fortunately that particular tool can be cloaked for.
| 12:17 am on Jun 1, 2008 (gmt 0)|
In general you really have to profile activity and not specific browser UAs.
For instance, anything asking for my robots.txt is flagged and further access denied
Likewise, asking for 10 pages in a few seconds will throw a flag and challenge the user since it's potentially a pre-fetch not following pre-fetch rules as I have pre-fetching disabled on my server. I discovered some things that looked like pre-fetch turned out to be someone telling their software to open every page in my blog feed at the same time, another no-no as well.
I could go on and on with various things that trip my traps, but UA is validated for bare basics and beyond that it's all scripts and behavioral monitoring magic.
| 4:39 am on Jun 1, 2008 (gmt 0)|
I am only going to talk about the Anti virus related traffic that is possible to identify not the spoof attempts using the same or similar User-Agents.
I have a function called IsAntiVirus, its whole purpose is to determine if the request is from a the AVG Antivirus packages in question. So its possible to feed it the proper rejection notice.
It checks a total of three (3) headers. (Excluding Proxy server checks)
- It makes sure the "Accept" header is missing.
- It makes sure the "Cache-Control" is set to "no-cache" for requests not coming though a proxy server.
- It makes sure that the "User-Agent" header matches one of the following
- "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
- "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)"
Once identified I still feed them a 403, but I send them the default NT 403 html with it. But I do not black list the IP so it can continue to browse my site without any problems.
This has the following beneficial effects.
- No longer get 30 requests for the same page, within a few seconds.
- Does not mark the site as having a problem.
- End users still have access to the site, without a hitch.
The users of my site are not effected at all, and they continue browsing without any problems. It just requires a minimal increase in my bandwidth to deal with this special class of automated check bots. But I keep my users happy while keeping the unwanted traffic to a minimum.
[edited by: Ocean10000 at 4:42 am (utc) on June 1, 2008]
| 3:01 pm on Jun 1, 2008 (gmt 0)|
I just double checked which static IIS 403 template I am using. So I can post an update. And discovered it is not one of the default ones supplied by Microsoft IIS install. So my previous post about using the default IIS 403 message was in error.