homepage Welcome to WebmasterWorld Guest from 54.227.40.166
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Are all UAs ending in "SV1)" invalid?
Mokita




msg:3663447
 5:22 am on May 31, 2008 (gmt 0)

In the recent thread about AVG's UA ending in 5.1;1813)

smallcompany wrote
How do you know what to block? I see those two without referrer:

User Agent = Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
User Agent = Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)

wilderness replied
There all invalid UA's based on "ends with" (hint)

I do not understand the "hint". While I see heaps of both UAs with no referrer, I also see a number that look like this:

58.172.195.nnn - - [29/May/2008:20:50:49 +1000] "GET /widget.htm HTTP/1.1" 403 - "http://www.google.com/search?hl=en&q=widget&btnG=Google+Search&meta=" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"

Is this a human, or is it bot pretending to come from Google?

I am concerned that I'm blocking genuine visitors.

 

wilderness




msg:3663825
 8:15 pm on May 31, 2008 (gmt 0)

Mokita,
FWIW, On a slow day, I set a deny for the SV1 and removed it after a few hours.

The 1813 gets caught by some white listing in place.
I've currently a redirect in place.
In the wee hours of this morning, I saw more than a dozen conseuctive redirects which affected Jeeves requests and I'm only able to surmise that Jeeves is chasing was chasing these for some reason. Each of the redirects, resulted in individual (and successful) subsequent page requests frome Jeeves "over a longer tham sucessive short period".

I don't like the idea of the SV1's, however I'm simply not willing to spend the time sorting out good IP's from bad IP's and then add these IP's to a conditional deny.

As far as:
58.172.195.nnn

I've a deny on place for the Class A.

Don

phred




msg:3663846
 9:18 pm on May 31, 2008 (gmt 0)

58.172.195.nnn - - [29/May/2008:20:50:49 +1000] "GET /widget.htm HTTP/1.1"

Is this a human, or is it bot pretending to come from Google?

I am concerned that I'm blocking genuine visitors.

That IP range is assigned to Australia; not a known haven for bot bad boys.

inetnum: 58.160.0.0 - 58.175.255.255
netname: TELSTRAINTERNET42-AU
person: Telstra Internet Address Registry
address: Telstra Internet
address: Locked Bag 5744
address: Canberra
address: ACT 2601
country: AU
phone: +61 3 9815 5923

APNIC is the registry for that part of the world which includes some pretty benign countries like Australia, New Zealand and Japan along with others that are more suspect like Thailand, China and Korea.

Deny(ing) Class A chunks of APNIC addresses will block entire groups of English speaking, reasonably affluent and generally well behaved users from visiting your site.

Cheers,
Phred

jdMorgan




msg:3663875
 10:11 pm on May 31, 2008 (gmt 0)

I think we've established that "SV1" is some sort of security upgrade for IE6, and not an indication of badness. See [webmasterworld.com...] - post 3660201 by member "smallcompany"

It is possible that a bad guy might spoof IE6/SV1 though, so seeing "SV1" doesn't mean the user-agent is "good" or "bad."

However, if you see IE7 with SP1, that's a fake UA.

Jim

wilderness




msg:3663880
 10:30 pm on May 31, 2008 (gmt 0)

Deny(ing) Class A chunks of APNIC addresses will block entire groups of English speaking, reasonably affluent and generally well behaved users from visiting your site.

That is entirely my intention.
As it was my intention during 2002 when I spent three weeks (accumualted databases were not available at that time; before another chirps in how easy it is today) sorting out IP ranges to allow access from specific Oceanic visitors in the (14420[23]21[01]61) Class A's.

Samizdata




msg:3663893
 11:08 pm on May 31, 2008 (gmt 0)

Are all UAs ending in "SV1)" invalid?

Definitely not - though scammers undoubtedly use it, same as any UA.

As Jim mentioned, our contributor smallcompany says he has it as his user-agent.

It is also used by the pre-AVG version of LinkScanner:
[webmasterworld.com...]

Is this a human, or is it bot pretending to come from Google?

I would say that particular example is a human who just searched on your keywords.

I would also say that giving such people a 403 is insanity.

I am concerned that I'm blocking genuine visitors.

The IE Blog says that SV1 stands for "Security Version 1" and indicates a minimum of XP SP2.

Most common IE UA strings have other tokens after the SV1, but not all of them.

If you download LinkScanner from CNET and install it you can see for yourself.

Fortunately that particular tool can be cloaked for.

incrediBILL




msg:3663922
 12:17 am on Jun 1, 2008 (gmt 0)

In general you really have to profile activity and not specific browser UAs.

For instance, anything asking for my robots.txt is flagged and further access denied

Likewise, asking for 10 pages in a few seconds will throw a flag and challenge the user since it's potentially a pre-fetch not following pre-fetch rules as I have pre-fetching disabled on my server. I discovered some things that looked like pre-fetch turned out to be someone telling their software to open every page in my blog feed at the same time, another no-no as well.

I could go on and on with various things that trip my traps, but UA is validated for bare basics and beyond that it's all scripts and behavioral monitoring magic.

Ocean10000




msg:3664026
 4:39 am on Jun 1, 2008 (gmt 0)

I am only going to talk about the Anti virus related traffic that is possible to identify not the spoof attempts using the same or similar User-Agents.

I have a function called IsAntiVirus, its whole purpose is to determine if the request is from a the AVG Antivirus packages in question. So its possible to feed it the proper rejection notice.

It checks a total of three (3) headers. (Excluding Proxy server checks)

  1. It makes sure the "Accept" header is missing.
  2. It makes sure the "Cache-Control" is set to "no-cache" for requests not coming though a proxy server.
  3. It makes sure that the "User-Agent" header matches one of the following
    • "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
    • "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)"

Once identified I still feed them a 403, but I send them the default NT 403 html with it. But I do not black list the IP so it can continue to browse my site without any problems.

This has the following beneficial effects.

  • No longer get 30 requests for the same page, within a few seconds.
  • Does not mark the site as having a problem.
  • End users still have access to the site, without a hitch.

The users of my site are not effected at all, and they continue browsing without any problems. It just requires a minimal increase in my bandwidth to deal with this special class of automated check bots. But I keep my users happy while keeping the unwanted traffic to a minimum.

[edited by: Ocean10000 at 4:42 am (utc) on June 1, 2008]

Ocean10000




msg:3664237
 3:01 pm on Jun 1, 2008 (gmt 0)

I just double checked which static IIS 403 template I am using. So I can post an update. And discovered it is not one of the default ones supplied by Microsoft IIS install. So my previous post about using the default IIS 403 message was in error.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved