Forum Moderators: open
Further testing proved very well worthwhile.
Contrary to my earlier suggestion and those of others posting in this forum, under NO circumstances should you use a redirect for the AVG Ua's unless you intend this to specifically point to a page to advise AVG users of the issues (unlikely in the case of commercial sites).
If you use a redirect this is the scenario.
The search engine results are being scanned in real time so any hits on your site from the linkscanner UA will indeed be redirected to your 'very-small-page' or indeed to AVG's own site if that's what you've decided. Unfortunately when the user clicks the search engine link the same UA is used and any traffic originating from an AVG user just ends up in la-la land rather than on your real page. You may save bandwidth but you've also just lost a customer.
Until AVG sorts this farce out, or a major player threatens them with legal action it looks like we're stuck with whatever this bunch of idiots throws at us.
Meanwhile I'll just keep harvesting the IP's as I'm sure McAfee etc could be interested. If you work for McAfee/Norton etc and would like to construct a CPM info page about your rival, I'm all ears and would love to send you some traffic as I suspect would many other users of WebmasterWorld.
when the user clicks the search engine link the same UA is used and any traffic originating from an AVG user just ends up in la-la land rather than on your real page
Perhaps you can tell us which version of AVG you are using, which browser you are using, which method of dealing with LinkScanner you are using, and what testing you have done.
We know that AVG is attempting to fix LinkScanner so changes are inevitable.
...
Browser: IE 6.0 on XP SP2 (browser purposely not upgraded to enable me to check sites with an older browser)
Method: htaccess redirects (now disabled) and bot traps at the moment although this will change as AVG changes
Testing: using the product as a user would but on sites that I control so all accesses can be monitored
If the method you use no longer works it doesn't necessarily mean that other methods will fail, and given its risible history it seems unlikely that LinkScanner is about to become any less of a joke.
...
Unfortunately when the user clicks the search engine link the same UA is used and any traffic originating from an AVG user just ends up in la-la land rather than on your real page. You may save bandwidth but you've also just lost a customer.
That's not possible because the redirect is based on the LinkScanner user agent which never matches the actual browser user agent unless your redirect code has a flaw in it.
RewriteCond %{HTTP_USER_AGENT} ;1813\)$ [OR]
RewriteCond %{HTTP_USER_AGENT} User\-Agent:\ Mozilla/4\.0\ \(compatible;\ MSIE\ 6\.0;\ Windows\ NT\ 5\.1;\ SV1\)$
RewriteRule !^xyz\.html$ /xyz.html [L]
The user agent that appeared in the logs for both the linkscanner hit from the SE listing AND for the click-through was the second malformed 'User-Agent:' version, and it was trapped by the redirect on both occasions. The search was performed using the AVG hijacked Yahoo which is installed by default.
The link I'd clicked in Yahoo should have taken me to
[abc.abc...]
but because of the AVG UA the redirect grabbed it and actually took me to
[abc.abc...]
as it would any other AVG user who had clicked it. The fact that it had a link to my homepage would be absolutely meaningless to the user. If the redirect had sent it to [avg.co.uk...] as I originally suggested then AVG would have benefitted from my Yahoo listing!
My log files show that 1813 arrives with no referer, then the visitor (same IP) arrives with his/her own UA and referer link.
1813 gets redirected to its own home,
visitor lands on the page linked from the SE
PS : I'm on a windows box
[edited by: Staffa at 9:57 pm (utc) on June 23, 2008]
I just tested the latest release (21 June) of AVG against two of my sites.
In both cases it performed a HEAD request followed by a GET request with this user-agent:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
In both cases I got the green checkmark and star in the Google results.
In both cases clicking the link in the SERPs took me to the correct homepage of the site.
In both cases I had not changed anything in my .htaccess file.
In both cases LinkScanner was comprehensively fooled as usual.
My eggs remain intact.
LinkScanner remains a joke.
...
In both cases it performed a HEAD request followed by a GET request with this user-agent:User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
The flaw here obviously is that a browser doesn't include "User-Agent:" in the actual user agent but even if they fix that they still have other flaws in the request which I won't mention because I don't want them to fix those either so I can continue to detect this junk and deal with it appropriately.
I'm still flabbergasted that they think the new UA using a HEAD and GET is a "FIX" because now it's doubling the amount of hits so it's DDoS x2!
<mod hat off to sing>
...
But where are the clowns?
Quick, send in the clowns.
Don't bother, they're here.
<mod hat on>
[edited by: incrediBILL at 10:15 pm (utc) on June 23, 2008]
Incredibill - I take your point about cookies.
And... if a legitimate or otherwise site isn't being deliberately accessed by the "visitor" (ie they didn't click on the google link) then any cookie set by the site on the browsing computer is presumably done without its user's knowledge and hence, according to UK law, is illegal (I know there are side issues in that one!). Presumably, though, AVG does not actually accept or set cookies?
Samizdata: thanks for the info on 403's. I suppose that's one way of discovering any new UAs from AVG, except AVG is likely to use a bog standard UA next time.
Sixpence: surely the actual link on the google page will be followed directly to the correct page with the correct browser UA - eg Firefox? So no traffic lost. Certainly I serve up a dummy page and that is not seen by an AVG visitor (re-tested this evening, AVG 8.1 db updated today, no "user-agent:" prefix nor HEAD detected).
Still unsure where the "user-agent:" prefix comes into all this. It's logging a similar UA to AVG (ie 1813 or SV1) so possibly whatever was distorting the UA occasionally before is now doing it to the AVG UA. Except that 1813 without prefix does not check HEAD so it must be doing something semi-automatic in its own right. If so, probably something that was released in the past week or so and installed after an AVG update. Except that one would have expected a similar quantity of hits before AND from varying UAs, and prior to last week the prefix has been quite scarce.
Since whatever it is accesses sites with old date-stamped page links I assume it's going through either browser-cached pages and re-checking them online or it's re-checking bookmarks.
Which still makes no sense because why did we not see them as soon as AVG came out? Or is AVG 8.1 only a very recent update?
Now read Samizdata's latest post (9:54pm) and noticed his AVG was 8.0 so that isn't the answer.
A thought from Sixpence's posting: could this prefix be generated by the yahoo toolbar in conjunction with AVG? How many people previously used the Yahoo toolbar instead of Google's and have now migrated / been usurped? Have they combined in a deadly inter-SE fight to the death using loaded AVGs? :)
I'm still flabbergasted that they think the new UA using a HEAD and GET is a "FIX"
I can only stand so much hilarity in one 24 hour period so I haven't tested it yet but my guess is that the preceding HEAD request might be intended to stop people redirecting to AVG's site.
Which would mean that they don't care about security of their customers at all.
Just their own bandwidth bill.
..
The idiotic part is you're supposed to use HEAD to check to see if the file changed once you cache it, not BEFORE you cache it!
[w3.org...]
9.4 HEAD...
The response to a HEAD request MAY be cacheable in the sense that the information contained in the response MAY be used to update a previously cached entity from that resource. If the new field values indicate that the cached entity differs from the current entity (as would be indicated by a change in Content-Length, Content-MD5, ETag or Last-Modified), then the cache MUST treat the cache entry as stale.
For instance, instead of 3K IPs asking for 3K pages it'll be 3K IPs asking for both the HEAD and GET so it's 2x the amount of request to process for the same flawed product.
So therefore, in an attempt to "fix" the problem it quickly becomes twice as bad, hence my singing a few posts back.
[edited by: incrediBILL at 10:36 pm (utc) on June 23, 2008]
All I can suggest is continued vigilance, testing and reporting.
The only conclusion we can currently draw is that AVG are desperately trying to salvage something out of the money they paid for LinkScanner and that we will probably see further inept tinkering.
Reasonably enough, any changes are likely to happen first in the paid version.
According to The Register, AVG claims 70 million users worldwide for its free and paid versions.
According to my logs only about 350 people on the planet actually use the paid one.
...
I know there has been some doubt about whether this user agent is coming from AVG, and some doubt as to whether it can also be a real visitor. For my part, I certainly get real visitors with this UA, and I'm certainly getting spoof hits from something with this UA, which I imagine is AVG.
The AVG hits never have a referrer, of course; but the absence of a referrer is not enough to be sure it's not a real visitor. On the other hand, AVG is unable to handle compression, while real visitors with this UA all seem to take compressed pages.
Can anyone say if this version of MSIE should always (or almost always) handle compression, please? If so, the rest is easy.
Thank you.
The user-agent is - in rare cases - used by real human visitors, so I fool it using other methods.
It would be easy to post details here, and even easier for the "Bad Guys" to implement it.
All the information you need is already on WebmasterWorld.
...
some doubt as to whether it can also be a real visitor
There is no doubt it's AVG because there are some other parameters in the HTTP request that you don't see in a log file that a normal browser doesn't make.
It's been mentioned a couple of time here so until they fix that problem it's still trivial to detect.
I would spell it out but I really don't want them to fix it, I want them to discard it! ;)
I don't think that webmasters can live in fear of losing clicks. That's no way to make decisions. Run the site correctly, be professional, and to hades with AVG and their ilk.
~D