homepage Welcome to WebmasterWorld Guest from 54.226.235.222
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 173 message thread spans 6 pages: < < 173 ( 1 2 3 4 [5] 6 > >     
AVG Toolbar Glitch May Be Causing Visitor Loss
User Agent Flaw Suspected
Umbra




msg:3615362
 2:36 pm on Mar 31, 2008 (gmt 0)

Seeing a rash of hits with an oddly formed user agent:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)
No referer

mod_security always throws an error for this one. Hits come from various IPs with no consistent pattern, seem to be residential IPs. Any idea what it is?

 

rise2it




msg:3666362
 4:44 am on Jun 4, 2008 (gmt 0)

I skipped about 40 posts, so excuse me if this got answered earlier...but it's probably NOT only the toolbar.

The main AVG screen has a settings block called 'link scanner' which can be disabled, but is enabled by default. This, in turn, throws up extra icons on Google pages in both IE and Firefox.

This means it's 'on' for every single person who has installed the software.

I assume if you come up in the search results page, it's doing a hit of your webpage (and every other page coming up in the search results), which is where this is coming from - so the end user may NEVER even click to go to your site, and you're still going to be showing a referral from this stuff, even though you never got a real visitor.

For instance, if you have Google set to show 10 results by default, AVG is going to pull all 10....if your search page is set for 20, 50, or 100 results then AVG is going to go pull ALL of those pages, and EVERY ONE of those sites are going to show this referrer.

*** That would explain the extra hits you are showing ***

Of course, that means it gets WORSE, because...

Now, if you somehow block AVG, then it's going to show you as being potentially 'bad' in the search engine results page, causing you to lose potential visitors.

I'm afraid to go look at any of MY referrer logs now...grrrrrrr

superclown2




msg:3666470
 9:14 am on Jun 4, 2008 (gmt 0)

I have spoken to a person called Adam at AVG technologies who tells me that he feels that his product is the lesser of two evils and he feels that the disruption to millions of webmaster's stats is justified by the extra safety the product givers to surfers. I have pointed out to him that with earlier versions at least it is possible to spoof the pre-fetch search but he commented that the product was still making the web a safer place to visit.

[edited by: incrediBILL at 10:11 am (utc) on June 4, 2008]
[edit reason] call to action removed - see tos #26 [/edit]

appi2




msg:3666521
 10:42 am on Jun 4, 2008 (gmt 0)

Not sure if you know but there's a couple of js scripts that do the AVG link scanner. Use firebug and you will see them. I think their in the ff2 chrome. i found one that does the call to avg to check the link. Not sure I didn't look too hard.

Sorry I can't say more but I just happend across it while debugging some js. And personally it's a great little tool that's flagged a couple of bad sites that I was linking to, both had iframe exploits.

But yes they should use the propper UA.
Have fun

offender




msg:3666523
 10:46 am on Jun 4, 2008 (gmt 0)

Hi,

I am having similar problems to spotter. I dont really care too much about the link scanner making extra bandwidth however I do care about the fact it is throwing 404's in the logs.

The link scanner seems to try and read the JS on the page which I have sitestats installed on and I get loads of the below requests.

GET /about-us//\"'+//\"'+//\"'+//\"'+//\"'+//\"'+//\"'+/'+ns_l+'/'+ns_l+'/'+ns_l+'//\"'+//\"'+/'+ns_l+'//\"'+//\"'+//\"'+/'+ns_l+'/'+ns_l+'//\"'+/'+ns_l+' HTTP/1.1" 404 14366 5894 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)"

When I say loads I mean in a 3 minute period I am getting about 5000 requests and 404's as a result.

Has anyone found a solution or thoughts on how I can stop the 404's via the JS file, it seems to have problems with the variables.

Many thanks

rise2it




msg:3667365
 7:06 am on Jun 5, 2008 (gmt 0)

"I have spoken to a person called Adam at AVG technologies..."

I'd say virtually all of us would disagree with Adam.

If all of this (avg playing big brother) actually caught on in a major form, the guys running bad sites would just move their bad code one page over, anyway - their site would then show up as 'approved' by AVG, the user would go there, click to go to the next page, and then get infected.

idiotgirl




msg:3671520
 7:02 pm on Jun 10, 2008 (gmt 0)

Geez, I haven't posted here in years, but I'm showing the same problems with this. Has anyone tried blocking only their .js files from SV1 via .htaccess, but allowing the pages themselves to be served? Obviously, it's probably going to send a red flag for potentially dangerous .js on your site, but at least it wouldn't be filling logfiles with redundant, non-existent garbage. Probably the only other way would be to cloak .js files based on UA or IP, but that's too risky to even contemplate. (It makes me feel dirty.)

Mostly, their bot needs to dummy up. I'm tired of mopping up after their mess.

Samizdata




msg:3671548
 8:04 pm on Jun 10, 2008 (gmt 0)

I would say that as things stand there are no risks and several benefits to cloaking a very small HTML file to these tools (which will also stop them fetching any external JavaScript).

AVG LinkScanner is currently identifiable by user-agent (;1813) and can be dealt with accordingly.

The Exploit Prevention Labs version of LinkScanner (SV1) is identifiable by other signifiers.

Both are absurdly easy to fool, as are other similar "security tools".

[webmasterworld.com...]

I only feel dirty when an unwanted robot penetrates my defences.

...

Seb7




msg:3675355
 5:13 pm on Jun 15, 2008 (gmt 0)

I've just started getting lots of hits. What with no cookie it ranks up the number of open sessions on the server 20 fold.

I'm thinking of trapping and returning a 302 or 301 redirect to AVG home page. If we all did that maybe they might think about making the toolbar a little more server friendly.

DamonHD




msg:3675413
 7:19 pm on Jun 15, 2008 (gmt 0)

Is there any evidence that the toolbar will follow any redirect you throw at it?

Rgds

Damon

willybfriendly




msg:3675429
 8:15 pm on Jun 15, 2008 (gmt 0)

My browser of choice is Opera, for which the link scan does not appear to be ported to.

I opened up FF today and noticed that the link scan is even hitting the adwords ads.

Is this showing as a click, and costing the advertiser?

The other think I notice about this is that if one disables link scanning in admin panel, then it shows an error in the tool tray icon for AVG.

Samizdata




msg:3675446
 8:42 pm on Jun 15, 2008 (gmt 0)

I appreciate that there have been many confusing threads about AVG LinkScanner in recent weeks.

Please note that the toolbar component of AVG 8 is not involved.

Is there any evidence that the toolbar will follow any redirect you throw at it?

People are certainly doing it (using .htaccess or PHP or whatever) but I haven't seen any test results.

Most seem to be redirecting to AVG's site, which seems rather apt.

Is this showing as a click, and costing the advertiser?

An earlier thread dedicated to the subject suggests not:

[webmasterworld.com...]

There have also been several threads on analytics and security, and if you want to know all about this sorry saga you could search WebmasterWorld for "LinkScanner" and see how the story unfolded.

Be warned that AVG will be forced to change the user-agent very soon.

...

superclown2




msg:3675476
 9:32 pm on Jun 15, 2008 (gmt 0)

They say they are working right now on a fix that will allow the thing to work without messing up log files. Here's hoping .....

Receptional Andy




msg:3675492
 10:08 pm on Jun 15, 2008 (gmt 0)

IMO the whole implementation is flawed, and AVG have been less than adept at reacting to the issue. My understanding at the moment is that the LinkScanner tool:

  • Exposes an AVG user's IP address to at least 10 unknown and potentially undesirable sites for every internet search they conduct

  • Has allowed the collection of large volumes of technical data on AVG users, for current or future exploitation

  • Unnecessarily floods website with requests, and implements no logic to reduce requests. Not even caching or robots exclusion (I understand you can cloak robots directives)

  • Doesn't work properly and responds unreasonably and erratically to unexpected data

  • Is wholly unnecessary except in the unknown percentage of cases that AVG is able to flag a site, but is unable to prevent the user from harm

    Here's hoping they have a fairly wide-reaching rethink...

  • incrediBILL




    msg:3675499
     10:30 pm on Jun 15, 2008 (gmt 0)

    Is wholly unnecessary except in the unknown percentage of cases that AVG is able to flag a site, but is unable to prevent the user from harm

    Exactly.

    I've pointed this out before and it bears restating that if the logic used in the link scanner isn't employed in the real-time reading of the webpage, which could prevent the user from harm even if AVG is unable to protect them from the infection, only then is the whole idea useful.

    This is what the AV that I'm using does with their steam scanning via a transparent proxy, a real solution, not a bandage wrapped in a marketing hype blanket.

    Samizdata




    msg:3675507
     11:36 pm on Jun 15, 2008 (gmt 0)

    Has allowed the collection of large volumes of technical data on AVG users, for current or future exploitation

    Eloquently put, and a warning to everyone who uses AVG to find another anti-virus package fast.

    An inconvenience, but if you want to make omelettes, you have to break some eggs.

    ...

    superclown2




    msg:3675733
     7:47 am on Jun 16, 2008 (gmt 0)

    "Eloquently put, and a warning to everyone who uses AVG to find another anti-virus package fast"

    As I've already pointed out to them, it's a product that could bring their company down.

    smallcompany




    msg:3676180
     7:04 pm on Jun 16, 2008 (gmt 0)

    Would anyone expect some webmasters creating a pop-up that tells to a user something like “you have AVG installed which does this and that” (negative tone)?

    superclown2




    msg:3676229
     8:06 pm on Jun 16, 2008 (gmt 0)

    "Would anyone expect some webmasters creating a pop-up that tells to a user something like “you have AVG installed which does this and that” (negative tone)? "

    I've no doubt that the least of the problems they'll have to face. Pretty imaginitive are webmasters when they're being messed about <G>!

    dekker23




    msg:3678785
     4:46 pm on Jun 19, 2008 (gmt 0)

    Could this problem effect PPC campaigns getting unnecessary charges like !Y Paid Inclusion links?

    ArthurSixpence




    msg:3680709
     11:34 am on Jun 22, 2008 (gmt 0)

    Anyone have any issues with this?

    RewriteCond %{HTTP_USER_AGENT} ;1813\)$ [OR]
    RewriteCond %{HTTP_USER_AGENT} User\-Agent:\ Mozilla/4\.0\ \(compatible;\ MSIE\ 6\.0;\ Windows\ NT\ 5\.1;\ SV1\)$
    RewriteRule ^(.*) [avg.co.uk...] [R=301,L]

    Yes, the second UA is from AVG as well. Personally I'm sick of these idiots profiting by stealing my bandwidth and wasting my time in having to deal with their ineptitude. It's a great idea but the implementation just stinks. I also resent having forked out 40 quid just to find out how this junk works.

    dstiles




    msg:3680957
     8:05 pm on Jun 22, 2008 (gmt 0)

    I've been seeing and blocking the prefix "user-agent: Mozilla/4.0..." for a very long time - a couple of years at least. Up until this past week the UA prefix has been only occasional. It seemed to be immediately followed by a hit on a form page so I assumed it was some kind of form spammer.

    Over the past week I've been seeing the prefix a lot more - approx 50% of my user-agent kill logs (that's ignoring the massive number of SQL injection attempts). The prefix has been associated almost exclusively with both of the user-agents mentioned here (ie 1813 and sv1).

    The site logs typically show as below (from two different sites)...

    9:06:27 ... HEAD /out-05.asp
    User-Agent:+Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;1813)
    19:06:29 ... GET /out-05.asp
    User-Agent:+Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;1813)

    and...

    20:01:14 ... HEAD /index.asp
    User-Agent:+Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1)

    20:01:14 ... GET /index.asp
    User-Agent:+Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1)

    Delays between the two hits of a pair tend to be 0-2 seconds.

    Sometimes there is only one pair (HEAD and GET), sometimes two pairs, rarely a third. This may be due in part to the site returning a 403. The IP is also added to a blocklist but this is an ASP-based one so it still records any further hits. There never seems to be hits to other pages following the initial attempts.

    In at least one instance the sequence went HEAD, GET, (7 second gap) HEAD, HEAD, (2 second gap) GET, GET. There was also, on a different site, an interim hit from the same IP of a google referer between two HEAD/GET pairs using the UA

    ...MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322;+.NET+CLR+2.0.50727)

    This did not have the "user-agent:" prefix. (NB: A lot of suspect and bad UAs include 50727.)

    I am no longer sure if the "user-agent:" prefix is a robot or a mal-formed UA due to someone fiddling. I'm seeing almost every instance of it coming from broadband lines. I think (but am not sure now without checking back) that "user-agent" prefix hits before last week were at least in part from servers and I do not recall them having HEAD fetches before. My sites get very few HEAD requests in the normal way so I think it has to be at least semi-robotic.

    Since current hits only seem to come in pairs from any given IP I have now switched the behaviour of the trap code to 403 only rather than 403 AND block IP, to avoid killing most of the world's dynamic broadband lines.

    One passing thought: I wonder if this could be some kind of badly-coded "bookmark" feature checking old site views for either updates or trojans. One of my sites includes a date-stamped querystring and the prefix hits are showing a very outdated stamp, anything up to three months or more, which I would expect on this site for a bookmarked page. If this is so I feel it's semi-automatic at least.

    incrediBILL




    msg:3680965
     8:15 pm on Jun 22, 2008 (gmt 0)

    I've seen the HEAD/GET stuff as well, but I don't think it's AVG.

    The Exploit Labs LinkScanner uses the following:
    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"

    No "User-Agent:" prefix, none of that.

    It's possible it's some other company now bundling this malware with variations on the user agent or a scraper or botnet has adopted the UA because this is the most perfect way for botnets to hide activity in the midst of such a public fiasco.

    wilderness




    msg:3680979
     8:46 pm on Jun 22, 2008 (gmt 0)

    User-Agent:+Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1)

    These multiple plus signs have long been a known weakness and many have denials in place.

    In addition these multiple plus signs have not] been used by the standard AVG scanner (at least on my sites).

    ArthurSixpence




    msg:3680981
     8:48 pm on Jun 22, 2008 (gmt 0)

    My assertion that the 'User-Agent' prefixed UA was from AVG was based on a trial from a machine built specifically to find out what the AVG link scanner was doing. I was expecting to see the 1813 UA but no. It is 100% definitely from AVG as it was the only source of traffic to this site at the time.

    Apart from the basic WinXP SP3 OS, the AVG 'Security' (hollow laugh) program is the only piece of non OS software on the PC. I simply fired up google in IE, searched for one of my pages I knew was listed and then checked the logs expecting to see a 1813 UA. Only the 'User-Agent' UA appeared.

    I guess that AVG may now be fiddling (think Nero here) with the 1813 UA to try to fit it with a false beard so no-one can see that it has "I'M FROM AVG" in big bright red flashing neon right across its forehead.

    More testing to do so I would suggest that there are more AVG UA's still to come. Meanwhile I'm tempted to just harvest the IP's and forward the unsuspecting users to a page which explains in great detail exactly how competent AVG's product is, or even let them read about it here!

    Staffa




    msg:3680989
     9:00 pm on Jun 22, 2008 (gmt 0)

    Multiple plus user agents are the way Windows server logs are written :

    Mozilla/5.0+(compatible;+Yahoo!+Slurp/3.0;+http://help.yahoo.com/help/us/ysearch/slurp)
    Mozilla/5.0+(compatible;+Ask+Jeeves/Teoma;++http://about.ask.com/en/docs/about/webmasters.shtml)

    Each space in a UA is filled with a + sign.

    Peter




    msg:3680996
     9:13 pm on Jun 22, 2008 (gmt 0)

    These "User-Agent: Mozilla..." UAs have always been rare on our (French) site, but I've seen several of them in the last three days, all with an identical false UA string, coming from residential addresses in France and Canada.

    81.57.1c.dd - - [22/Jun/2008:17:40:37 +0200] "HEAD / HTTP/1.1" 403 - "-" "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
    81.57.1c.dd - - [22/Jun/2008:17:40:38 +0200] "GET / HTTP/1.1" 403 1 "-" "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"

    I'm pretty sure these are ***real*** vistors (or would-be visitors, they get a 403).

    One of them, after trying a few times, came back with a real UA string and a referrer from a Google search that would have sent him to the pages he'd previously got 403s on. So I think he must have turned something off and tried again because he really wanted to see our pages and suspected that something new on his end was stopping him.

    Finally, he came back a third time, a little later, with the same false UA, and got 403s again on the same pages. I suppose the visitor wanted to be sure what the problem was. I admit, I'd like to know too.

    [Edit] After reading Appi2's posts in the other thread at [webmasterworld.com...] I see that this visitor just did his Google search several times, and chose us some of the times.

    Peter.

    [edited by: Peter at 9:35 pm (utc) on June 22, 2008]

    wilderness




    msg:3680997
     9:15 pm on Jun 22, 2008 (gmt 0)

    Multiple plus user agents are the way Windows server logs are written :

    Mozilla/5.0+(compatible;+Yahoo!+Slurp/3.0;+http://help.yahoo.com/help/us/ysearch/slurp)
    Mozilla/5.0+(compatible;+Ask+Jeeves/Teoma;++http://about.ask.com/en/docs/about/webmasters.shtml)

    Each space in a UA is filled with a + sign.

    Funny thing is the two major SE exapmples you've provided sail through my sites daily, while this one:

    +Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1)

    will eat 403's till the cows come in.
    And to repeat, "I'm not alone.

    dstiles




    msg:3681024
     10:09 pm on Jun 22, 2008 (gmt 0)

    Incredibill: The fact that only I get a couple of hits each IP suggests to me it's not a scraper. On the other hand, there are a lot of such hits, although from different IPs and on different (discontinuous) sites.

    Wilderness: Those lines are direct from the site logs. I pasted them in "raw" so of course they have spaces replaced by plus signs. On the other hand, it suggests that truly bad UAs with plus signs are probably scraped from logs in the first place.

    Peter: Most of the past week's prefix UAs are from what I consider legitimate browser ISPs in the UK, Germany, France, USA etc. Of course, they may all be trojanned but there are a lot of them.

    In general (in this thread and others):

    A lot has been posted about AVG creating a database of IPs. This is true to a certain extent but don't forget that most people use dynamic IPs and get a fresh one every time they turn their computer off - typically over-night and possibly early morning before work. In which case most IPs will be AVG-valid for no more than a few hours before being replaced.

    To discover these odd UAs, I suppose what's really needed is a web page that invites people to submit a form telling you what software they have installed. Probably no one would fill it in, though. I occasionally get someone send in a form complaining I've killed them but very seldom.

    I seem to recall mention herabouts that AVG ignores the return code (403 etc) when "validating" a page. If that is so maybe I'll return a 403 plus my dummy 1813 "page" to be on the safe side.

    incrediBILL




    msg:3681039
     10:24 pm on Jun 22, 2008 (gmt 0)

    don't forget that most people use dynamic IPs and get a fresh one every time they turn their computer off

    Not true with many cable and DSL services.

    I've had 4 IPs in 8 years.

    Besides, you overlook the fact that once I know you're using AVG and hit my site within a reasonable time frame I can shove a cookie in your browser that will ID you as an AVG user forever, not matter what IP you have, until you dump that cookie.

    [edited by: incrediBILL at 10:24 pm (utc) on June 22, 2008]

    Samizdata




    msg:3681083
     11:53 pm on Jun 22, 2008 (gmt 0)

    most IPs will be AVG-valid for no more than a few hours before being replaced

    Some ISPs allocate a static IP whether you request it or not, and some companies require you to have a static IP in order to use access their network from home. In other cases a dynamic IP can remain the same for months, and often does.

    I believe the situation may be slightly different in USA than in Europe (where AVG is particularly popular), but almost everyone I know uses a router and has a static IP - and those who use Windows have AVG installed because I recommended it to them.

    I seem to recall mention herabouts that AVG ignores the return code (403 etc) when "validating" a page.

    If you block an AVG user-agent with a straight 403 you will send it into overdrive and it will make 30 requests for each result in the SERPs - in one of my tests it produced 120 (one hundred and twenty) 403s in twelve seconds.

    You will also find that you won't get AVG's green star of approval but will instead get a grey question next to the link to your site that will discourage anyone from clicking it.

    The LinkScanner FAQ says:

    This does not pose any major problem, since if you decide to visit such a website, its address will be scanned again by the Search-Shield.

    So all this bandwidth wasting nonsense was for nothing anyway - LinkScanner is not designed to make you any safer, but just to make you FEEL safer.

    But as it is so easily fooled, you are actually LESS SAFE.

    The best team of comedy writers in the world could not make this stuff up.

    ...

    ArthurSixpence




    msg:3681411
     1:25 pm on Jun 23, 2008 (gmt 0)

    VERY IMPORTANT

    Further testing proved very well worthwhile.

    Contrary to my earlier suggestion and those of others posting in this forum, under NO circumstances should you use a redirect for the AVG Ua's unless you intend this to specifically point to a page to advise AVG users of the issues (unlikely in the case of commercial sites).

    If you use a redirect this is the scenario.

    The search engine results are being scanned in real time so any hits on your site from the linkscanner UA will indeed be redirected to your 'very-small-page' or indeed to AVG's own site if that's what you've decided. Unfortunately when the user clicks the search engine link the same UA is used and any traffic originating from an AVG user just ends up in la-la land rather than on your real page. You may save bandwidth but you've also just lost a customer.

    Until AVG sorts this farce out, or a major player threatens them with legal action it looks like we're stuck with whatever this bunch of idiots throws at us.

    Meanwhile I'll just keep harvesting the IP's as I'm sure McAfee etc could be interested. If you work for McAfee/Norton etc and would like to construct a CPM info page about your rival, I'm all ears and would love to send you some traffic as I suspect would many other users of WebmasterWorld.

    This 173 message thread spans 6 pages: < < 173 ( 1 2 3 4 [5] 6 > >
    Global Options:
     top home search open messages active posts  
     

    Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
    rss feed

    All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
    Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
    WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
    © Webmaster World 1996-2014 all rights reserved