homepage Welcome to WebmasterWorld Guest from 23.23.12.202
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 173 message thread spans 6 pages: 173 ( [1] 2 3 4 5 6 > >     
AVG Toolbar Glitch May Be Causing Visitor Loss
User Agent Flaw Suspected
Umbra

10+ Year Member



 
Msg#: 3615360 posted 2:36 pm on Mar 31, 2008 (gmt 0)

Seeing a rash of hits with an oddly formed user agent:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)
No referer

mod_security always throws an error for this one. Hits come from various IPs with no consistent pattern, seem to be residential IPs. Any idea what it is?

 

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3615360 posted 11:52 pm on Mar 31, 2008 (gmt 0)

I'm not 100% positive (99.9999%), but the lack of a space after the ";" in ";1813" probably means it's an invalid MSIE user agent in the first place.

What it does, I'm not sure.

DanA

10+ Year Member



 
Msg#: 3615360 posted 9:05 am on Apr 28, 2008 (gmt 0)

This User Agent with no referrer and no language is used by AVG anti-virus (version 8.0 with security toolbar installed) when checking searches (Google, Yahoo, Live search ...) for infected results.

[edited by: DanA at 9:06 am (utc) on April 28, 2008]

Umbra

10+ Year Member



 
Msg#: 3615360 posted 2:07 pm on Apr 28, 2008 (gmt 0)

Thanks DanA. Well if someone from modsecurity.org or AVG is reading this thread... Either AVG fixes an apparently malformed user agent, and/or modsecurity applies a patch.

smallcompany

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3615360 posted 6:23 pm on May 1, 2008 (gmt 0)

This hits hard my PPC campaigns. Every page visited by this thing gives 404 as characters like "?" or "=" get translated into those with "%" which produces bad link.

Now, I wonder if this comes from actual user or just AVG itself, aka, does a user still see my page correctly?

I see AVG will let paid subscribers only to contact them.

Ocean10000

WebmasterWorld Administrator 10+ Year Member



 
Msg#: 3615360 posted 2:59 pm on May 3, 2008 (gmt 0)

Even if they fix the User-Agent. I would still be blocking this activity. Since it does not match other known characteristics of Microsoft Internet Explorer, aka its missing other headers which IE provides.

I have two or three of these a day pound away receiving 403 errors the whole time, usually give up around 30 attempts or so each.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3615360 posted 8:05 pm on May 6, 2008 (gmt 0)

Same procedure duplicated over two websites on different pages and different directory.

After being denied, reurned immediately and requested directory index and then root index.

No images, no robots text. Process repeated on each website more than three times.

Not variations in IP ranges.

63.80.56.zz - - [06/May/2008:17:48:55 +0100] "GET /MyFolder/MyPage HTTP/1.1" 403 1100 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3"
64.184.179.zz - - [06/May/2008:17:48:57 +0100] "GET /SameFolder/SamePage HTTP/1.1" 403 1100 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3"
64.184.179.zz - - [06/May/2008:17:48:57 +0100] "GET /SameFolder/SamePage HTTP/1.0" 403 1100 "-" "Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 )"
70.51.240.zzz - - [06/May/2008:17:48:58 +0100] "GET /SameFolder/SamePage HTTP/1.1" 403 1100 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)"
65.222.217.zzz - - [06/May/2008:17:59:57 +0100] "GET /SameFolder/SamePage HTTP/1.0" 403 1100 "-" "Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 )"

smallcompany

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3615360 posted 9:03 pm on May 6, 2008 (gmt 0)

How do you know what to block? I see those two without referrer:

User Agent = Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
User Agent = Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)

Are we still on the same page that this is new AVG’s LinkScanner?

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3615360 posted 9:41 pm on May 6, 2008 (gmt 0)

How do you know what to block? I see those two without referrer:

There all invalid UA's based on "ends with" (hint)

Are we still on the same page that this is new AVG’s LinkScanner?

I've only submitted these on this topic because the initial visit (not shown in the log lines) was this UA and the topic of this thread:

"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)"

edited by wildernss.

As a result of the many IP's and UA's it's difficult to imagine this from an AV software.
More likely some new server virus.

Samizdata

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3615360 posted 12:13 pm on May 8, 2008 (gmt 0)

AVG deserve to be pilloried for introducing yet another fake user-agent with their (installed by default) "Security Toolbar", but I am now getting too many of these to ignore and have therefore been forced to allow it.

The crunch came when I gave the URL of one of my sites to an old friend over the phone - he didn't type it into the address bar like a normal person but apparently used the stupid toolbar and got a response saying something like "cannot verify this site" - which to him and anyone else would mean "do not go there" rather than "AVG's new toolbar is rubbish and was refused access because it pretended to be something it wasn't - just like all the other robotic scumbags out there".

Shame on you, Grisoft.

smallcompany

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3615360 posted 8:14 am on May 9, 2008 (gmt 0)

Are there any blog posts about this? Time to act?

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3615360 posted 4:42 pm on May 9, 2008 (gmt 0)

I was hit by this UA today 145 times.
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)"

It looks more like a distributed scrape attempt than a toolbar-based link checker because each IP only accessed a single page, nothing more. However, it could be a combination of both s new toolbar and a scraper using the UA at the same time.

AVG deserve to be pilloried for introducing yet another fake user-agent

What proof do we have this is AVG?

Has anyone installed the toolbar and verified it's using this UA yet?

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3615360 posted 4:52 pm on May 9, 2008 (gmt 0)

Bill,
I've been getting some of these:

"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)"

which result in 403's and are immediately followed by a full page request (+images) by another browser, which is consistent with a filter software.

However, the many other assorted requests appearing as well may infact be a scraper of some sort.

There was a time when I might install a software to test a UA, however these days I'm not willing to jump through such hoops for invalid UA's and 403-resolutions ;)

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3615360 posted 5:40 pm on May 9, 2008 (gmt 0)

Installing the software for me would at least confirm the toolbar is spitting out bad headers because my site automatically blocked every access by that UA so I'm just curious if it's truly the culprit at this point.

I would be surprised if AVG wrote something with both a bad UA, since it would be blocked for the lack of a space after the semicolor ";" and bad headers, but I want to know for sure. I'm assuming if it's AVG they were trying to hide the fact that it was the AVG toolbar to avoid cloaked pages designed to mask malware infected pages or phishing attempts. However, if that is the case, they didn't do a very good job of it because this stands out like a sore thumb.

Samizdata

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3615360 posted 6:18 pm on May 9, 2008 (gmt 0)

Has anyone installed the toolbar and verified it's using this UA yet?

Yes, I have.

This user-agent is an indication of a potential human visitor who is searching on keywords that your site is ranking for - the kind of person you want to encourage, not block.

With the "Security Toolbar" installed the results pages of the user's Google, Yahoo and MSN searches are scanned and marked safe or otherwise, with a pop-up that gives more information - including the exhortation "Site owners please contact AVG Technologies for questions".

Apache access logs show the real user's IP with the fake user-agent, and in my first test the little critter took the index page and javascript file of one of my sites when I Googled it (but did not access it), so it is pre-fetching as well as faking the user-agent.

Disgraceful, but I will not block genuine visitors - most who search on my target keywords will click the link to my sites anyway... unless they get a pop-up saying AVG doesn't trust it.

I was until now a fan of AVG and installed it on many computers. The previous version is end-of-life and users are expected to upgrade to this new version in their millions within a few weeks, and a large proportion of them will not uncheck the "Install Security Toolbar" option.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3615360 posted 7:45 pm on May 9, 2008 (gmt 0)

This user-agent is an indication of a potential human visitor who is searching on keywords that your site is ranking for - the kind of person you want to encourage, not block.

Actually, the invalid UA and invalid header are just the opposite, they are indicators of a fake, something you block to stop from getting scraped, hacked or worse.

If all this additional activity is indeed caused by pre-fetch, I'll be having a real big problem with the AVG toolbar real fast.

Samizdata

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3615360 posted 8:12 pm on May 9, 2008 (gmt 0)

they are indicators of a fake

A fake user-agent with a real oxygen-breathing human on the end of it.

One who just searched on your primary keywords.

One who in all probability is genuinely interested in what you have to offer.

One who doesn't uncheck toolbar installations when installing software.

One who has faith in AVG to keep the nasties at bay.

A normal computer user.

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3615360 posted 9:24 pm on May 9, 2008 (gmt 0)

Samizdata.
With the "Security Toolbar" installed the results pages of the user's Google, Yahoo and MSN searches are scanned and marked safe or otherwise, with a pop-up that gives more information - including the exhortation "Site owners please contact AVG Technologies for questions".

Did they give any indication of how we should contact them?

Thanks,
Jim

Samizdata

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3615360 posted 9:33 pm on May 9, 2008 (gmt 0)

No indication whatsoever.

For those who have this blocked, your search engine result will have a grey question mark rather than a bright green star, and the pop-up will say "AVG Search-Shield was unable to read this page. It may no longer exist or there may have been an error."

It sucks, but even the paranoid are welcome on my sites.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3615360 posted 9:54 pm on May 9, 2008 (gmt 0)

A fake user-agent with a real oxygen-breathing human on the end of it.

You're preaching to the choir here as I understand the problem fully.

I'm not going to make an exception for them and potentially unlock a gaping hole in my website security, not happening.

However, I do plan to contact them if AVG's toolbar is verified as the problem and ask them to fix it. As a matter of fact, what they are currently doing is technically a security flaw that allows any AVG customer to potentially be exploited by any website they visit.

How is the AVG toolbar a security risk you might ask?

If someone knows AVG has a flaw and it can't detect a specific vulnerability or exploit then they can specifically target anyone that exhibits this toolbar user agent.

The toolbar should echo the EXACT same user agent and header as the browser to avoid detection.

Samizdata

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3615360 posted 10:34 pm on May 9, 2008 (gmt 0)

I understand the problem fully

I didn't doubt you for a moment.

I am an enthusiast rather than an expert (learning much of what I know from contributors to this thread) and I respect any webmaster's right to allow or deny access to their sites as they see fit.

But nobody using this idiotic toolbar when I had it blocked came to my sites.

Since unblocking it, almost all do.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3615360 posted 11:33 pm on May 9, 2008 (gmt 0)

A normal computer user... who should be wondering why after installing the toolbar can no longer access their favourite sites...

I've just modified a script so that people presenting that UA get a message about their computer sending a fake ID and it then asks them to check if they have AVG installed. It ends "contact your anti-virus/security vendor".

Key_Master

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3615360 posted 12:44 am on May 10, 2008 (gmt 0)

Actually, the invalid UA and invalid header are just the opposite, they are indicators of a fake, something you block to stop from getting scraped, hacked or worse.

The user-agent header value has no official format or structure.
[w3.org...]

It amazed me just how many hits I get a day with the "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)" user agent. No way I'd consider blocking it.

To each his own though. :)

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3615360 posted 1:02 am on May 10, 2008 (gmt 0)

The user-agent header value has no official format or structure.

When it comes to MSIE, Microsoft spells out exactly how their browser user agents will be formatted so if you're going to try to fake MSIE you should at least do it right since it's well documented!

[msdn.microsoft.com...]

[edited by: incrediBILL at 1:03 am (utc) on May 10, 2008]

Key_Master

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3615360 posted 1:26 am on May 10, 2008 (gmt 0)

Even if one were to believe that M$ was responsible for the web standards behind http headers, there is nothing noted in your link that would make this user-agent invalid, even by "M$ standards".

I don't want to argue with you over it but there is no user-agent standard so it can't be invalid. Feel free to ban it if you wish.

Ocean10000

WebmasterWorld Administrator 10+ Year Member



 
Msg#: 3615360 posted 2:02 am on May 10, 2008 (gmt 0)

Key_Master
Simply put valid IE strings when assembled from all the registry settings will always have a space after ";" following the OS designation if there are extra details. This is hard coded in IE.

I have been working with Browser Identification systems for long time now and have seen a large number of IE User-Agents spoofed and real. So I know what I am talking about here. That the extra details (after the OS) would start out with a "; " before anything is written. The extra details can contain ";" without spaces but it can not start with a ";" without the space. You are free to believe what ever you want though.

[edited by: Ocean10000 at 2:02 am (utc) on May 10, 2008]

Key_Master

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3615360 posted 2:43 am on May 10, 2008 (gmt 0)

The extra details can contain ";" without spaces but it can not start with a ";" without the space.

First, show me the proof- where's the spec that says it has to be done that way. I've been dealing with user agent headers for over a decade. Windows and software installations often corrupt user-agent headers- it's stored in the registry, an area prone to errors.

Second, who's to say this agent is spoofed? The AVG site states that "the LinkScanner component can be connected with the AVG Security Toolbar, which is part of Internet Explorer or Mozilla Firefox Internet browsers".

Let's break the user-agent down:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)

Translation: IE 6.0 on an XP machine.

Well, every visitor I see using this LinkScanner is on a XP machine. Most of the machines are using IE 7.0 but that is a relatively new browser and it's very likely that the component was written or installed with this user-agent hard coded in the software before IE 7.0 became available.

I've also noticed that if a FireFox user has this component installed, LinkScanner still uses the IE user-agent.

All this leaves me to believe that LinkScanner uses a component that is a M$ approved variant of IE to fetch pages off of the web. Maybe somebody who has this installed could do a more thorough investigation.

Regardless of what you may think of the way the user agent is structured, it most certainly is used by real oxygen breathing humans. It isn't invalid by even the strictest of standards. Afterall, it is an ASCII and it isn't an empty string. :)

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3615360 posted 2:58 am on May 10, 2008 (gmt 0)

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)

Translation: something faking MSIE 6.0 on an XP machine

It's the lack of space before the "1813" token that makes it invalid.

Take a look at any of the Browser Capabilities (browscap) files and you'll note that all legit variations have a space before the next token. I run a site that gets over 600K visitors a month and have never logged a valid UA from a MSIE browser that didn't conform to that simple format.

Likewise, another flawed fake is "compatible ;" and or ""Mozilla/4.0(" and so on and so forth. Of course the most hysterical are the tools written by someone that see the "+" in the IIS log files and use something like this UA:
"Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1)".

Anyway, the long and the short of it is that there are many sites out there with regex rules in Apache or IIS that are now bouncing the malformed UA created by the toolbar that should have a space before the token based on the standard way MSIE displays tokens in the UA.

If it's fixed, AVG's toolbar won't have a problem...

... except for the rest of us that only accept the standard MSIE headers which is also broken in this instance.

Regardless of what you may think of the way the user agent is structured, it most certainly is used by real oxygen breathing humans

Sorry to disagree but a toolbar performing pre-fetch is by no stretch a real human, it's an automated tool data mining in anticipation of a need by a real human, and not doing a very good job at it since it tripped everyone's bot traps and regex filters that were programmed using real world data.

Regardless of UA specifications, it doesn't fit the norm of %99.99999999999 of the MSIE browsers used by real humans hitting my site.

[edited by: incrediBILL at 3:02 am (utc) on May 10, 2008]

Ocean10000

WebmasterWorld Administrator 10+ Year Member



 
Msg#: 3615360 posted 3:11 am on May 10, 2008 (gmt 0)

All this leaves me to believe that LinkScanner uses a component that is a M$ approved variant of IE to fetch pages off of the web. Maybe somebody who has this installed could do a more thorough investigation.

Just because they use a component lib of IE if its using IE at all, doesn't automatically mean Microsoft approves of it.

I am guessing since this works in IE or Firefox that it doesn't even require IE to work, that it has its own lib separate from IE to do the downloads and checks.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3615360 posted 3:22 am on May 10, 2008 (gmt 0)

not doing a very good job at it since it tripped everyone's bot traps and regex filters

Bill,
I'm sure Key_Master's traps were tripped as well! :

[webmasterworld.com...]

You don't have the same critera as I, nor does Key_Master as you.

It's simply a matter of choice and although there's not any real arguing going on here (hopefully Key_Master's choice of word was in haste) and everybody is simply sticking to their own guns as to personal choice of beneficial or detrimental.

Don

This 173 message thread spans 6 pages: 173 ( [1] 2 3 4 5 6 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved