Forum Moderators: open
The main AVG screen has a settings block called 'link scanner' which can be disabled, but is enabled by default. This, in turn, throws up extra icons on Google pages in both IE and Firefox.
This means it's 'on' for every single person who has installed the software.
I assume if you come up in the search results page, it's doing a hit of your webpage (and every other page coming up in the search results), which is where this is coming from - so the end user may NEVER even click to go to your site, and you're still going to be showing a referral from this stuff, even though you never got a real visitor.
For instance, if you have Google set to show 10 results by default, AVG is going to pull all 10....if your search page is set for 20, 50, or 100 results then AVG is going to go pull ALL of those pages, and EVERY ONE of those sites are going to show this referrer.
*** That would explain the extra hits you are showing ***
Of course, that means it gets WORSE, because...
Now, if you somehow block AVG, then it's going to show you as being potentially 'bad' in the search engine results page, causing you to lose potential visitors.
I'm afraid to go look at any of MY referrer logs now...grrrrrrr
[edited by: incrediBILL at 10:11 am (utc) on June 4, 2008]
[edit reason] call to action removed - see tos #26 [/edit]
Sorry I can't say more but I just happend across it while debugging some js. And personally it's a great little tool that's flagged a couple of bad sites that I was linking to, both had iframe exploits.
But yes they should use the propper UA.
Have fun
I am having similar problems to spotter. I dont really care too much about the link scanner making extra bandwidth however I do care about the fact it is throwing 404's in the logs.
The link scanner seems to try and read the JS on the page which I have sitestats installed on and I get loads of the below requests.
GET /about-us//\"'+//\"'+//\"'+//\"'+//\"'+//\"'+//\"'+/'+ns_l+'/'+ns_l+'/'+ns_l+'//\"'+//\"'+/'+ns_l+'//\"'+//\"'+//\"'+/'+ns_l+'/'+ns_l+'//\"'+/'+ns_l+' HTTP/1.1" 404 14366 5894 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)"
When I say loads I mean in a 3 minute period I am getting about 5000 requests and 404's as a result.
Has anyone found a solution or thoughts on how I can stop the 404's via the JS file, it seems to have problems with the variables.
Many thanks
I'd say virtually all of us would disagree with Adam.
If all of this (avg playing big brother) actually caught on in a major form, the guys running bad sites would just move their bad code one page over, anyway - their site would then show up as 'approved' by AVG, the user would go there, click to go to the next page, and then get infected.
Mostly, their bot needs to dummy up. I'm tired of mopping up after their mess.
AVG LinkScanner is currently identifiable by user-agent (;1813) and can be dealt with accordingly.
The Exploit Prevention Labs version of LinkScanner (SV1) is identifiable by other signifiers.
Both are absurdly easy to fool, as are other similar "security tools".
[webmasterworld.com...]
I only feel dirty when an unwanted robot penetrates my defences.
...
I opened up FF today and noticed that the link scan is even hitting the adwords ads.
Is this showing as a click, and costing the advertiser?
The other think I notice about this is that if one disables link scanning in admin panel, then it shows an error in the tool tray icon for AVG.
Please note that the toolbar component of AVG 8 is not involved.
Is there any evidence that the toolbar will follow any redirect you throw at it?
People are certainly doing it (using .htaccess or PHP or whatever) but I haven't seen any test results.
Most seem to be redirecting to AVG's site, which seems rather apt.
Is this showing as a click, and costing the advertiser?
An earlier thread dedicated to the subject suggests not:
[webmasterworld.com...]
There have also been several threads on analytics and security, and if you want to know all about this sorry saga you could search WebmasterWorld for "LinkScanner" and see how the story unfolded.
Be warned that AVG will be forced to change the user-agent very soon.
...
Here's hoping they have a fairly wide-reaching rethink...
Is wholly unnecessary except in the unknown percentage of cases that AVG is able to flag a site, but is unable to prevent the user from harm
Exactly.
I've pointed this out before and it bears restating that if the logic used in the link scanner isn't employed in the real-time reading of the webpage, which could prevent the user from harm even if AVG is unable to protect them from the infection, only then is the whole idea useful.
This is what the AV that I'm using does with their steam scanning via a transparent proxy, a real solution, not a bandage wrapped in a marketing hype blanket.
As I've already pointed out to them, it's a product that could bring their company down.
I've no doubt that the least of the problems they'll have to face. Pretty imaginitive are webmasters when they're being messed about <G>!
RewriteCond %{HTTP_USER_AGENT} ;1813\)$ [OR]
RewriteCond %{HTTP_USER_AGENT} User\-Agent:\ Mozilla/4\.0\ \(compatible;\ MSIE\ 6\.0;\ Windows\ NT\ 5\.1;\ SV1\)$
RewriteRule ^(.*) [avg.co.uk...] [R=301,L]
Yes, the second UA is from AVG as well. Personally I'm sick of these idiots profiting by stealing my bandwidth and wasting my time in having to deal with their ineptitude. It's a great idea but the implementation just stinks. I also resent having forked out 40 quid just to find out how this junk works.
Over the past week I've been seeing the prefix a lot more - approx 50% of my user-agent kill logs (that's ignoring the massive number of SQL injection attempts). The prefix has been associated almost exclusively with both of the user-agents mentioned here (ie 1813 and sv1).
The site logs typically show as below (from two different sites)...
9:06:27 ... HEAD /out-05.asp
User-Agent:+Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;1813)
19:06:29 ... GET /out-05.asp
User-Agent:+Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;1813)
and...
20:01:14 ... HEAD /index.asp
User-Agent:+Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1)
20:01:14 ... GET /index.asp
User-Agent:+Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1)
Delays between the two hits of a pair tend to be 0-2 seconds.
Sometimes there is only one pair (HEAD and GET), sometimes two pairs, rarely a third. This may be due in part to the site returning a 403. The IP is also added to a blocklist but this is an ASP-based one so it still records any further hits. There never seems to be hits to other pages following the initial attempts.
In at least one instance the sequence went HEAD, GET, (7 second gap) HEAD, HEAD, (2 second gap) GET, GET. There was also, on a different site, an interim hit from the same IP of a google referer between two HEAD/GET pairs using the UA
...MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322;+.NET+CLR+2.0.50727)
This did not have the "user-agent:" prefix. (NB: A lot of suspect and bad UAs include 50727.)
I am no longer sure if the "user-agent:" prefix is a robot or a mal-formed UA due to someone fiddling. I'm seeing almost every instance of it coming from broadband lines. I think (but am not sure now without checking back) that "user-agent" prefix hits before last week were at least in part from servers and I do not recall them having HEAD fetches before. My sites get very few HEAD requests in the normal way so I think it has to be at least semi-robotic.
Since current hits only seem to come in pairs from any given IP I have now switched the behaviour of the trap code to 403 only rather than 403 AND block IP, to avoid killing most of the world's dynamic broadband lines.
One passing thought: I wonder if this could be some kind of badly-coded "bookmark" feature checking old site views for either updates or trojans. One of my sites includes a date-stamped querystring and the prefix hits are showing a very outdated stamp, anything up to three months or more, which I would expect on this site for a bookmarked page. If this is so I feel it's semi-automatic at least.
The Exploit Labs LinkScanner uses the following:
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
No "User-Agent:" prefix, none of that.
It's possible it's some other company now bundling this malware with variations on the user agent or a scraper or botnet has adopted the UA because this is the most perfect way for botnets to hide activity in the midst of such a public fiasco.
Apart from the basic WinXP SP3 OS, the AVG 'Security' (hollow laugh) program is the only piece of non OS software on the PC. I simply fired up google in IE, searched for one of my pages I knew was listed and then checked the logs expecting to see a 1813 UA. Only the 'User-Agent' UA appeared.
I guess that AVG may now be fiddling (think Nero here) with the 1813 UA to try to fit it with a false beard so no-one can see that it has "I'M FROM AVG" in big bright red flashing neon right across its forehead.
More testing to do so I would suggest that there are more AVG UA's still to come. Meanwhile I'm tempted to just harvest the IP's and forward the unsuspecting users to a page which explains in great detail exactly how competent AVG's product is, or even let them read about it here!
Mozilla/5.0+(compatible;+Yahoo!+Slurp/3.0;+http://help.yahoo.com/help/us/ysearch/slurp)
Mozilla/5.0+(compatible;+Ask+Jeeves/Teoma;++http://about.ask.com/en/docs/about/webmasters.shtml)
Each space in a UA is filled with a + sign.
81.57.1c.dd - - [22/Jun/2008:17:40:37 +0200] "HEAD / HTTP/1.1" 403 - "-" "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
81.57.1c.dd - - [22/Jun/2008:17:40:38 +0200] "GET / HTTP/1.1" 403 1 "-" "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
I'm pretty sure these are ***real*** vistors (or would-be visitors, they get a 403).
One of them, after trying a few times, came back with a real UA string and a referrer from a Google search that would have sent him to the pages he'd previously got 403s on. So I think he must have turned something off and tried again because he really wanted to see our pages and suspected that something new on his end was stopping him.
Finally, he came back a third time, a little later, with the same false UA, and got 403s again on the same pages. I suppose the visitor wanted to be sure what the problem was. I admit, I'd like to know too.
[Edit] After reading Appi2's posts in the other thread at [webmasterworld.com...] I see that this visitor just did his Google search several times, and chose us some of the times.
Peter.
[edited by: Peter at 9:35 pm (utc) on June 22, 2008]
Multiple plus user agents are the way Windows server logs are written :Mozilla/5.0+(compatible;+Yahoo!+Slurp/3.0;+http://help.yahoo.com/help/us/ysearch/slurp)
Mozilla/5.0+(compatible;+Ask+Jeeves/Teoma;++http://about.ask.com/en/docs/about/webmasters.shtml)Each space in a UA is filled with a + sign.
Funny thing is the two major SE exapmples you've provided sail through my sites daily, while this one:
+Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1)
will eat 403's till the cows come in.
And to repeat, "I'm not alone.
Wilderness: Those lines are direct from the site logs. I pasted them in "raw" so of course they have spaces replaced by plus signs. On the other hand, it suggests that truly bad UAs with plus signs are probably scraped from logs in the first place.
Peter: Most of the past week's prefix UAs are from what I consider legitimate browser ISPs in the UK, Germany, France, USA etc. Of course, they may all be trojanned but there are a lot of them.
In general (in this thread and others):
A lot has been posted about AVG creating a database of IPs. This is true to a certain extent but don't forget that most people use dynamic IPs and get a fresh one every time they turn their computer off - typically over-night and possibly early morning before work. In which case most IPs will be AVG-valid for no more than a few hours before being replaced.
To discover these odd UAs, I suppose what's really needed is a web page that invites people to submit a form telling you what software they have installed. Probably no one would fill it in, though. I occasionally get someone send in a form complaining I've killed them but very seldom.
I seem to recall mention herabouts that AVG ignores the return code (403 etc) when "validating" a page. If that is so maybe I'll return a 403 plus my dummy 1813 "page" to be on the safe side.
don't forget that most people use dynamic IPs and get a fresh one every time they turn their computer off
Not true with many cable and DSL services.
I've had 4 IPs in 8 years.
Besides, you overlook the fact that once I know you're using AVG and hit my site within a reasonable time frame I can shove a cookie in your browser that will ID you as an AVG user forever, not matter what IP you have, until you dump that cookie.
[edited by: incrediBILL at 10:24 pm (utc) on June 22, 2008]
most IPs will be AVG-valid for no more than a few hours before being replaced
Some ISPs allocate a static IP whether you request it or not, and some companies require you to have a static IP in order to use access their network from home. In other cases a dynamic IP can remain the same for months, and often does.
I believe the situation may be slightly different in USA than in Europe (where AVG is particularly popular), but almost everyone I know uses a router and has a static IP - and those who use Windows have AVG installed because I recommended it to them.
I seem to recall mention herabouts that AVG ignores the return code (403 etc) when "validating" a page.
If you block an AVG user-agent with a straight 403 you will send it into overdrive and it will make 30 requests for each result in the SERPs - in one of my tests it produced 120 (one hundred and twenty) 403s in twelve seconds.
You will also find that you won't get AVG's green star of approval but will instead get a grey question next to the link to your site that will discourage anyone from clicking it.
The LinkScanner FAQ says:
This does not pose any major problem, since if you decide to visit such a website, its address will be scanned again by the Search-Shield.
So all this bandwidth wasting nonsense was for nothing anyway - LinkScanner is not designed to make you any safer, but just to make you FEEL safer.
But as it is so easily fooled, you are actually LESS SAFE.
The best team of comedy writers in the world could not make this stuff up.
...