Forum Moderators: open
But he does tell malware infested drive-by download sites how to fool it.
[theregister.co.uk...]
...
About the .htaccess that Samizdata recommended to serve a small file determine by their User Agent name, which the first line is...
RewriteCond %{HTTP_USER_AGENT} ;1813\)$
Should I add one more and serve them to 'SV1' as well?
I'm not a programmer so I apologize if my question looks a little bit baby for someone.
Please advice.
"There are still ways for concerned web masters to filter LinkScanner requests out of their statistics"
Presumably there will be an officially recommended method on AVG's LinkScanner forum.
[freeforum.avg.com...]
If not, just ask them.
...
"The change from 1813 to SV1 was part of a planned release"
So AVG planned to change from something easily fooled to... something easily fooled...
Interesting concept.
"I wouldn't visit any website that we show a red verdict for, except on a goat pc."
You shouldn't visit any website it shows green for either - unless you enjoy Russian Roulette.
...
I have listed the hits from both known UAs below (that's 1813 and user-agent: prefix). Logs for today (Monday 30th June) are short by 4 hours but they look like hitting Sunday's target. Hits are from two servers: first server has about 60 web sites of varying popularity (composite hits from all are logged below); second server is a fairly popular single-site server.
Not sure why the discrepancy on Thursday/Friday but could be due to the second server being mainly UK domestic/social-club visitors whilst the first server is world-wide visitors.
Hits shown are from trap logs not from site logs.
Note that the hits are a total of both types of UA so some are single-hit (GET only) and some are double-hit (HEAD plus GET) - the latter counts as two hits in the totals below (sorry about that!). Both servers seem to be roughly equally split between the two types.
If the SV1 UA is being used without a prefix then it's not being used as a HEAD/GET pair since I'm trapping that UA separately and seeing no obvious IP pairing.
In theory one would expect an increase of aggregate hits as the HEAD/GET system takes over from the GET version. One would not expect a large drop as shown. I should check site logs for HEAD accesses but have no time at present to do so.
Dates are all June 2008...
Thu 19 952 2611
Fri 20 855 2229
Sat 21 797 2499
Sun 22 848 2982
Mon 23 974 3085
Tue 24 927 3055
Wed 25 1233 3248
Thu 26 1151 708
Fri 27 1133 442
Sat 28 305 460
Sun 29 264 584
Mon 30 254 456
If the SV1 UA is being used without a prefix then it's not being used as a HEAD/GET pair since I'm trapping that UA separately and seeing no obvious IP pairing.
I'll say it again, you can't just check the SV1 UA because it's a legit UA if I'm not mistaken and you can't attribute all of it to AVG anyway because other junk uses it as well.
For instance, so far today the UA with SV1 has hit my site 12,939 times.
Out of that there are 2,398 GETs from AVG or some other incarnation of that link scanner technology, but there are only 247 HEAD requests.
[edited by: incrediBILL at 8:45 pm (utc) on June 30, 2008]
In my experience many of the basic SV1 hits are bad or semi-bad bots with off-beat secondary characteristics, although seemingly "real" browsers also exhibit some of the odd characteristics as well.
Blend27 - I wish! :(
Program update AVG 8.0.101Fixed Bugs
Fix prevents the free product from using the wrong user agent during Search-Shield scans
So some of their fake user-agents were wrong, but the new one is right.
And they provide (as yet unspecified) ways to filter our statistics.
And it was all planned in advance.
And it is fixed now.
Yay!
...
If a no-referrer bot is using that IE 6.0 user-agent then it's no better than LinkScanner anyway. But I think they are 99 percent LinkScanner. Why do I think this? Because non-LinkScanner bots who use that agent aren't pounding the same page thousands of times a day. There's absolutely no point in it.
Also, the exact user-agent we're seeing that ends in SV1 and has no other junk on it isn't as common as you might think.
Forget the HEAD pairing. It's quite rare now.
I'm getting nearly 10,000 LinkScanner hits a day now on just two sites. Since it's not unusual to get multiple hits from the same LinkScanner installation, the number of unique IPs I'm recording is about 75 percent of the total hits. I just started recording about 40 hours ago. It seems to be getting a lot worse lately.
The sort of site that will get a disproportionate number of LinkScanner traffic is one where the home page has a fair amount of text on it, and also ranks fairly well on at least one keyword that lots of noobs might use in a search. For example, I have a site where the home page ranks between 20 and 30 in Google for a search on the single word "gmail," and it has about 18K bytes of text on it. If someone with LinkScanner searches for "gmail" plus one other word, and that word is on my page, the chances are rather good that it will get LinkScanned.
Home pages get hit more often than deep pages by LinkScanner, even when the links present by the search engine doesn't indicate that this must be the case.
I forgot this site until just two days ago, because I haven't changed it in years and I don't track traffic on it. But right now it's my favorite honey trap!
Presumably that's the one with the "user-agent:" prefix
My understanding (borne out by tests) is that the "right" dishonest user-agent is meant to be "SV1" - without the prefix, and of course with no mention of LinkScanner or AVG. A liar and a fraud.
Also, according to The Register:
"Roger Thompson says the for-pay LinkScanner is only using the IE6 user agent."
The 30-day trial I downloaded used this (with added HEAD requests):
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
similar to that used by scraping bots
Except that AVG says this one is a figment of your imagination.
...
"we still enable those webmasters who want to filter our requests out of their results to do so"
I encourage anyone who has any patience left to contact AVG and ask for precise details.
If you tell them you want to circulate the information on the world's leading webmaster forum in order to help AVG get their message out to the people who need it then they should respond.
Or you could just say you were an ordinary browser.
...
I'm also getting similarly suspicious effects for another "basic" MSIE UA - Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727) - that one is predominantly sql injection attacks, though.
Following comments here and elsewhere I have revisited some of my other trap logs.
The UA Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) (without the prefix) has significantly increased its activity during the past few days, probably totalling up to make up the previous AVG total hits. It does seem to be the new AVG one and it does not appear to use HEAD. I half-suspected its usage yesterday and stopped blocking the IPs it came in on - the increase from its previously low activity on that UA was far too freqent to be a normal bad bot. It still gets a 403 though - have to fix that, I suppose. Damn!
One thing I might ask, if possible, is updates to the .htaccess code that redirects them back to AVG as the user-agents change and get added to. I'm not great at it, and can't necessarily expand on the 1813 one we had before.
One thing I might ask, if possible, is updates to the .htaccess code
Stefan, I apologise if this does not seem helpful enough at first sight.
AVG are saying "we still enable those webmasters who want to filter our requests out of their results to do so", and their officially sanctioned method will presumably be the best way to detect and deal with any problems caused by their LinkScanner.
The best way to help other webmasters is to ask AVG for it and to post it here.
...
I've filtered ;1813 out couple weeks ago and the number already looked right to me. But since they change to SV1, should I filter the 'SV1' out as well?
My concern is, is there any other User Agent having the same name as this one....
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
Because I'm afraid to filter out the real traffic as well instead of just this annoying AVG.
Thanks again.
My concern is, is there any other User Agent having the same name
As has been stated many times here - and even in The Register - this user-agent:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
Can appear in your logs when used for legitimate Internet Explorer 6 requests made by humans.
AVG say there are ways to differentiate those from LinkScanner, which also uses that user-agent.
"we still enable those webmasters who want to filter our requests out of their results to do so"
Once again, I encourage you to contact AVG and ask for the officially sanctioned method.
I don't need it, but others will.
...
Most significantly, there was a three-hour period when I wasn't redirecting LinkScanner back to AVG, and then for the rest of the day I was redirecting all LinkScanner hits back to AVG. In comparing the period before and after the switchover, there were consistent HEAD/GET pairings from the same IP address for all LinkScanner hits that had the "User-Agent" in front, but only before the switch. After the switch, all the the "User-Agent" hits that showed a HEAD failed to follow up with a GET for that IP address.
Conclusion: One purpose of the HEAD, and perhaps the only purpose, is to detect the redirect. If a redirect is detected, the GET is not done.
Question 1: Is it only detecting redirects to grisoft.com and avg.com, or will any redirect prevent the GET?
Question 2: If it detects a redirect to grisoft.com or avg.com, what happens with the gray / green / red indicator in the user's browser for that link?
I refuse to install that pig package, even if I do get 30 days free. I installed the free version for one day last week just to see what it could do. The next day I uninstalled, and also did a System Restore on my XP to a prior restore point, just to be sure I was rid of it.
I've never used a virus detector in the 25 years I've used personal computers, and have never had any problems. I use common sense when I surf, and switch to JavaScript on Firefox only when I can't do something without it, and I stay away from Explorer altogether. Common sense told me to get rid of that viral AVG package!
If it detects a redirect to grisoft.com or avg.com
I don't believe the HEAD request does anything but prove that AVG are incompetent.
what happens with the gray / green / red indicator
In all my tests I get the green checkmark no matter how I fool LinkScanner.
I tested the redirect method using two of my sites and saw both sets of logs.
Sending AVG the bill for LinkScanner in this way works fine.
But I do not advocate it - I merely refuse to condemn it.
"we still enable those webmasters who want to filter our requests out of their results to do so"
Has AVG unveiled the officially sanctioned LinkScanner detection method yet?
I would ask them myself, but I don't think they read my stuff.
...
RewriteCond %{REQUEST_METHOD} ^GET$
That extra requirement in the htaccess file would handle their HEAD without redirecting, which would cause them to follow up with their GET, which would then redirect.
1) AGV LinkScanner does not set the following environment variables:
HTTP_REFERER
HTTP_ACCEPT
HTTP_ACCEPT_LANGUAGE
HTTP_ACCEPT_ENCODING
HTTP_ACCEPT_CHARSET
Normal browsers/crawlers usually set at least one of these, so you could reject (or redirect) requests from known AVG HTTP_USER_AGENTs with all of the above environment variables blank.
2) As far as I know, AVG LinkScanner does not download images so, if you're worried about losing legit visitors, a variation on the above could be:
Display suspicious traffic a no cache welcome gateway page with a small image which sets a cookie when it's loaded. The welcome page should also contain an "Enter" link which reloads the page when a user clicks on it.
You could set this up the same way as 1) but include an HTTP_COOKIE check to see if the cookie has been set. If it has, display the real page, if not, display the gateway page.
Thoughts?
LinkScanner deliberately lies about what it is in order to access my files on my computer so that AVG Technologies can make a profit while handing me the bandwidth cost and causing me problems.
That makes it malware, pure and simple.
...
From this article, the only thing I understand is that AVG is trying to confuse the "Back Hats"! What a great experiment! Well, Good luck with that…..
I am not aware of any tracking software that reads IIS Log files and finds any HTTP Headers information in them. I have never looked for one, nor will.
-- But there is a way of eliminating this fake traffic from log files --
When the browser, bot or LinkScanner makes a request to the website then entry is recorded in the log files. There is no way for an average webmaster to eliminate that entry. Then Stats software parses the log files and reports to the Stats User that there was a request made to the website. Real visitor or a bot or a LinkScanner - it’s a hit.
So, in order to differentiate between LinkScanner requests we need to install/write a software that would not rely on LogFiles at all or will take the HTTP Headers information into the effect to give the accurate Website Stats. That means we need to spend Money.
In order for the webmaster to know whether his/her site validates against the AVGs software, one must download and install the dumb thing. That means that webmasters system must be exposed to AVGs software. I don’t know if this could called "Marketing", but I would slap the label of !@$% on it.
So I have to rely upon other people to test AVG against my traps for me. Difficult since I've been telling them to uninstall it...
Dishonest
Their whole strategy is built on lying to gain access to our servers.
Hypocritical
They say they want to help webmasters then immediately do the opposite.
Incompetent
The recent "security fix" proves (again) that they have no idea what they are doing.
Malicious
They say they want to break our eggs and they distribute free malware to do it.
All Of The Above
This option is included because it would get my vote.
....
[edited by: Samizdata at 7:14 pm (utc) on July 2, 2008]