homepage Welcome to WebmasterWorld Guest from 54.196.196.62
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 219 message thread spans 8 pages: < < 219 ( 1 2 3 4 [5] 6 7 8 > >     
Register Scolds AVG For Generating Fake Traffic As Link Malware
Webmasters Complain AVG Debilitating Traffic Analytics
Samizdata




msg:3674412
 8:52 pm on Jun 13, 2008 (gmt 0)

In an otherwise interesting article about AVG LinkScanner the author spectacularly misses the point that because it can easily be identified it is worse than useless as a security tool.

But he does tell malware infested drive-by download sites how to fool it.

[theregister.co.uk...]

...

 

Cyclob




msg:3686779
 11:08 am on Jun 30, 2008 (gmt 0)

Also, is filtering only ;1813 is enough? I've read that the 'SV1' is also one of them. Should I filter them out as well?

About the .htaccess that Samizdata recommended to serve a small file determine by their User Agent name, which the first line is...

RewriteCond %{HTTP_USER_AGENT} ;1813\)$

Should I add one more and serve them to 'SV1' as well?

I'm not a programmer so I apologize if my question looks a little bit baby for someone.

Please advice.

Samizdata




msg:3686792
 11:45 am on Jun 30, 2008 (gmt 0)

AVG's Chief Research Officer was quoted in The Register article thus:

"There are still ways for concerned web masters to filter LinkScanner requests out of their statistics"

Presumably there will be an officially recommended method on AVG's LinkScanner forum.

[freeforum.avg.com...]

If not, just ask them.

...

Samizdata




msg:3686838
 1:19 pm on Jun 30, 2008 (gmt 0)

While I am in a quoting mood, AVG's Pat Bitton says in The Register's comments:

"The change from 1813 to SV1 was part of a planned release"

So AVG planned to change from something easily fooled to... something easily fooled...

Interesting concept.

"I wouldn't visit any website that we show a red verdict for, except on a goat pc."

You shouldn't visit any website it shows green for either - unless you enjoy Russian Roulette.

...

dstiles




msg:3687163
 8:37 pm on Jun 30, 2008 (gmt 0)

Has AVG already changed to a new UA?

I have listed the hits from both known UAs below (that's 1813 and user-agent: prefix). Logs for today (Monday 30th June) are short by 4 hours but they look like hitting Sunday's target. Hits are from two servers: first server has about 60 web sites of varying popularity (composite hits from all are logged below); second server is a fairly popular single-site server.

Not sure why the discrepancy on Thursday/Friday but could be due to the second server being mainly UK domestic/social-club visitors whilst the first server is world-wide visitors.

Hits shown are from trap logs not from site logs.

Note that the hits are a total of both types of UA so some are single-hit (GET only) and some are double-hit (HEAD plus GET) - the latter counts as two hits in the totals below (sorry about that!). Both servers seem to be roughly equally split between the two types.

If the SV1 UA is being used without a prefix then it's not being used as a HEAD/GET pair since I'm trapping that UA separately and seeing no obvious IP pairing.

In theory one would expect an increase of aggregate hits as the HEAD/GET system takes over from the GET version. One would not expect a large drop as shown. I should check site logs for HEAD accesses but have no time at present to do so.

Dates are all June 2008...

Thu 19 952 2611
Fri 20 855 2229
Sat 21 797 2499
Sun 22 848 2982
Mon 23 974 3085
Tue 24 927 3055
Wed 25 1233 3248
Thu 26 1151 708
Fri 27 1133 442
Sat 28 305 460
Sun 29 264 584
Mon 30 254 456

incrediBILL




msg:3687171
 8:44 pm on Jun 30, 2008 (gmt 0)

If the SV1 UA is being used without a prefix then it's not being used as a HEAD/GET pair since I'm trapping that UA separately and seeing no obvious IP pairing.

I'll say it again, you can't just check the SV1 UA because it's a legit UA if I'm not mistaken and you can't attribute all of it to AVG anyway because other junk uses it as well.

For instance, so far today the UA with SV1 has hit my site 12,939 times.

Out of that there are 2,398 GETs from AVG or some other incarnation of that link scanner technology, but there are only 247 HEAD requests.

[edited by: incrediBILL at 8:45 pm (utc) on June 30, 2008]

blend27




msg:3687182
 8:55 pm on Jun 30, 2008 (gmt 0)

-- Has AVG already changed to a new UA? --

or maybe people already started uninstalling that software?

dstiles




msg:3687207
 9:32 pm on Jun 30, 2008 (gmt 0)

Incredibill - I'm aware that the basic SV1 UA is used by other bots and browsers. My observation concerned the fact that I have not seen any paired IPs for SV1 type UAs - ie for Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) - so I felt I could discount a possibility that AVG was now using that unless they had dropped the HEAD access. The hits I gave in my posting were for the SV1 UA with the "user-agent:" prefix.

In my experience many of the basic SV1 hits are bad or semi-bad bots with off-beat secondary characteristics, although seemingly "real" browsers also exhibit some of the odd characteristics as well.

Blend27 - I wish! :(

Samizdata




msg:3687209
 9:41 pm on Jun 30, 2008 (gmt 0)

From the AVG site on the current (23 June) free version:

Program update AVG 8.0.101

Fixed Bugs

Fix prevents the free product from using the wrong user agent during Search-Shield scans

So some of their fake user-agents were wrong, but the new one is right.

And they provide (as yet unspecified) ways to filter our statistics.

And it was all planned in advance.

And it is fixed now.

Yay!

...

dstiles




msg:3687226
 10:10 pm on Jun 30, 2008 (gmt 0)

"...the new one is right"

Presumably that's the one with the "user-agent:" prefix similar to that used by scraping bots. Sounds about right. :)

Scarecrow




msg:3687227
 10:14 pm on Jun 30, 2008 (gmt 0)

The user-agent that ends in SV1 must, at a minimum, be considered suspicious only if the referrer is blank. Almost all of these will be bots, unless you have the sort of page that people tend to bookmark, or if tend to key your URL into the address bar. For example, if you have a cool tool on your page, a lot of people might have it bookmarked. In that case you wouldn't see a referrer from them. But if your site is fairly static as opposed to interactive, and people tend to read it once after seeing it in a search engine and have no cause to bookmark it, then they will be showing up with a referrer.

If a no-referrer bot is using that IE 6.0 user-agent then it's no better than LinkScanner anyway. But I think they are 99 percent LinkScanner. Why do I think this? Because non-LinkScanner bots who use that agent aren't pounding the same page thousands of times a day. There's absolutely no point in it.

Also, the exact user-agent we're seeing that ends in SV1 and has no other junk on it isn't as common as you might think.

Forget the HEAD pairing. It's quite rare now.

I'm getting nearly 10,000 LinkScanner hits a day now on just two sites. Since it's not unusual to get multiple hits from the same LinkScanner installation, the number of unique IPs I'm recording is about 75 percent of the total hits. I just started recording about 40 hours ago. It seems to be getting a lot worse lately.

The sort of site that will get a disproportionate number of LinkScanner traffic is one where the home page has a fair amount of text on it, and also ranks fairly well on at least one keyword that lots of noobs might use in a search. For example, I have a site where the home page ranks between 20 and 30 in Google for a search on the single word "gmail," and it has about 18K bytes of text on it. If someone with LinkScanner searches for "gmail" plus one other word, and that word is on my page, the chances are rather good that it will get LinkScanned.

Home pages get hit more often than deep pages by LinkScanner, even when the links present by the search engine doesn't indicate that this must be the case.

I forgot this site until just two days ago, because I haven't changed it in years and I don't track traffic on it. But right now it's my favorite honey trap!

Samizdata




msg:3687237
 10:36 pm on Jun 30, 2008 (gmt 0)

Presumably that's the one with the "user-agent:" prefix

My understanding (borne out by tests) is that the "right" dishonest user-agent is meant to be "SV1" - without the prefix, and of course with no mention of LinkScanner or AVG. A liar and a fraud.

Also, according to The Register:

"Roger Thompson says the for-pay LinkScanner is only using the IE6 user agent."

The 30-day trial I downloaded used this (with added HEAD requests):

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)

similar to that used by scraping bots

Except that AVG says this one is a figment of your imagination.

...

Samizdata




msg:3687245
 10:53 pm on Jun 30, 2008 (gmt 0)

ends in SV1 and has no other junk on it isn't as common as you might think

On my sites it is indeed rare, though incrediBILL seems to get plenty of them.

But Mr Thompson tells us there are ways to filter it out anyway, so all is well.

...

Samizdata




msg:3687292
 12:49 am on Jul 1, 2008 (gmt 0)

Reading the comments in The Register again I find it hard to tell who is speaking for AVG - their last post was entitled "Additional comments from Roger Thompson at AVG", but also said it was written by Pat Bitton. They may even be the same person, but whoever they are they said this on Friday:

"we still enable those webmasters who want to filter our requests out of their results to do so"

I encourage anyone who has any patience left to contact AVG and ask for precise details.

If you tell them you want to circulate the information on the world's leading webmaster forum in order to help AVG get their message out to the people who need it then they should respond.

Or you could just say you were an ordinary browser.

...

dstiles




msg:3687293
 12:49 am on Jul 1, 2008 (gmt 0)

I've been getting Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) UAs for years and a lot of them acted like scrapers of some kind.

I'm also getting similarly suspicious effects for another "basic" MSIE UA - Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727) - that one is predominantly sql injection attacks, though.

Following comments here and elsewhere I have revisited some of my other trap logs.

The UA Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) (without the prefix) has significantly increased its activity during the past few days, probably totalling up to make up the previous AVG total hits. It does seem to be the new AVG one and it does not appear to use HEAD. I half-suspected its usage yesterday and stopped blocking the IPs it came in on - the increase from its previously low activity on that UA was far too freqent to be a normal bad bot. It still gets a 403 though - have to fix that, I suppose. Damn!

Stefan




msg:3687343
 2:41 am on Jul 1, 2008 (gmt 0)

Good work staying on it, Samizdata, and everyone else. I've been reading these threads from the start. Respect.

One thing I might ask, if possible, is updates to the .htaccess code that redirects them back to AVG as the user-agents change and get added to. I'm not great at it, and can't necessarily expand on the 1813 one we had before.

Samizdata




msg:3687545
 9:08 am on Jul 1, 2008 (gmt 0)

One thing I might ask, if possible, is updates to the .htaccess code

Stefan, I apologise if this does not seem helpful enough at first sight.

AVG are saying "we still enable those webmasters who want to filter our requests out of their results to do so", and their officially sanctioned method will presumably be the best way to detect and deal with any problems caused by their LinkScanner.

The best way to help other webmasters is to ask AVG for it and to post it here.

...

Cyclob




msg:3687587
 10:38 am on Jul 1, 2008 (gmt 0)

"The UA Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) (without the prefix) ", dstiles said...

What do you actually mean by "without the prefix"? Could you please clarify?

Many thanks in advance.

Staffa




msg:3687598
 11:08 am on Jul 1, 2008 (gmt 0)

UA without the prefix :

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)

With the prefix :

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)

Both are used by AVG

Cyclob




msg:3687602
 11:24 am on Jul 1, 2008 (gmt 0)

Thanks Staffa! I've checked my log file and found lots of this 'SV1' but they all have referrers, unlike the ;1813 one though.

I've filtered ;1813 out couple weeks ago and the number already looked right to me. But since they change to SV1, should I filter the 'SV1' out as well?

My concern is, is there any other User Agent having the same name as this one....

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)

Because I'm afraid to filter out the real traffic as well instead of just this annoying AVG.

Thanks again.

Samizdata




msg:3687642
 12:37 pm on Jul 1, 2008 (gmt 0)

My concern is, is there any other User Agent having the same name

As has been stated many times here - and even in The Register - this user-agent:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)

Can appear in your logs when used for legitimate Internet Explorer 6 requests made by humans.

AVG say there are ways to differentiate those from LinkScanner, which also uses that user-agent.

"we still enable those webmasters who want to filter our requests out of their results to do so"

Once again, I encourage you to contact AVG and ask for the officially sanctioned method.

I don't need it, but others will.

...

Scarecrow




msg:3688136
 1:18 am on Jul 2, 2008 (gmt 0)

The HEAD/GET pairing from LinkSpanner is not very frequent. I believe it is only present in certain updated non-free versions of AVG 8. Out of 5992 LinkScanner hits on one site today, only 11 percent of these were HEAD requests. All these HEADs had "User-Agent: " in front of the standard SV1 user-agent.

Most significantly, there was a three-hour period when I wasn't redirecting LinkScanner back to AVG, and then for the rest of the day I was redirecting all LinkScanner hits back to AVG. In comparing the period before and after the switchover, there were consistent HEAD/GET pairings from the same IP address for all LinkScanner hits that had the "User-Agent" in front, but only before the switch. After the switch, all the the "User-Agent" hits that showed a HEAD failed to follow up with a GET for that IP address.

Conclusion: One purpose of the HEAD, and perhaps the only purpose, is to detect the redirect. If a redirect is detected, the GET is not done.

Question 1: Is it only detecting redirects to grisoft.com and avg.com, or will any redirect prevent the GET?

Question 2: If it detects a redirect to grisoft.com or avg.com, what happens with the gray / green / red indicator in the user's browser for that link?

I refuse to install that pig package, even if I do get 30 days free. I installed the free version for one day last week just to see what it could do. The next day I uninstalled, and also did a System Restore on my XP to a prior restore point, just to be sure I was rid of it.

I've never used a virus detector in the 25 years I've used personal computers, and have never had any problems. I use common sense when I surf, and switch to JavaScript on Firefox only when I can't do something without it, and I stay away from Explorer altogether. Common sense told me to get rid of that viral AVG package!

Samizdata




msg:3688144
 1:35 am on Jul 2, 2008 (gmt 0)

If it detects a redirect to grisoft.com or avg.com

I don't believe the HEAD request does anything but prove that AVG are incompetent.

what happens with the gray / green / red indicator

In all my tests I get the green checkmark no matter how I fool LinkScanner.

I tested the redirect method using two of my sites and saw both sets of logs.

Sending AVG the bill for LinkScanner in this way works fine.

But I do not advocate it - I merely refuse to condemn it.

"we still enable those webmasters who want to filter our requests out of their results to do so"

Has AVG unveiled the officially sanctioned LinkScanner detection method yet?

I would ask them myself, but I don't think they read my stuff.

...

Scarecrow




msg:3688156
 2:26 am on Jul 2, 2008 (gmt 0)

Well, as long as you get a green check I guess it's okay. I was worried that the HEAD ping might be an attempt to find anti-LinkScanner websites, which would then get a non-green check mark next to the link. If that were the case, my htaccess file would require an additional condition before I redirect. I believe it would look like this:

RewriteCond %{REQUEST_METHOD} ^GET$

That extra requirement in the htaccess file would handle their HEAD without redirecting, which would cause them to follow up with their GET, which would then redirect.

Samizdata




msg:3688425
 12:27 pm on Jul 2, 2008 (gmt 0)

The Register half-heartedly tries again:

[theregister.co.uk...]

Make of it what you will.

...

Kerrin




msg:3688520
 2:45 pm on Jul 2, 2008 (gmt 0)

Possible solutions to block AVG:

1) AGV LinkScanner does not set the following environment variables:

HTTP_REFERER
HTTP_ACCEPT
HTTP_ACCEPT_LANGUAGE
HTTP_ACCEPT_ENCODING
HTTP_ACCEPT_CHARSET

Normal browsers/crawlers usually set at least one of these, so you could reject (or redirect) requests from known AVG HTTP_USER_AGENTs with all of the above environment variables blank.

2) As far as I know, AVG LinkScanner does not download images so, if you're worried about losing legit visitors, a variation on the above could be:

Display suspicious traffic a no cache welcome gateway page with a small image which sets a cookie when it's loaded. The welcome page should also contain an "Enter" link which reloads the page when a user clicks on it.

You could set this up the same way as 1) but include an HTTP_COOKIE check to see if the cookie has been set. If it has, display the real page, if not, display the gateway page.

Thoughts?

Samizdata




msg:3688535
 3:02 pm on Jul 2, 2008 (gmt 0)

Thank you Kerrin. And while we are stating the bleedin' obvious:

LinkScanner deliberately lies about what it is in order to access my files on my computer so that AVG Technologies can make a profit while handing me the bandwidth cost and causing me problems.

That makes it malware, pure and simple.

...

blend27




msg:3688693
 6:23 pm on Jul 2, 2008 (gmt 0)

---AVG has said "There are still ways for concerned web masters to filter LinkScanner requests out of their statistics."---

From this article, the only thing I understand is that AVG is trying to confuse the "Back Hats"! What a great experiment! Well, Good luck with thatÖ..

I am not aware of any tracking software that reads IIS Log files and finds any HTTP Headers information in them. I have never looked for one, nor will.

-- But there is a way of eliminating this fake traffic from log files --

When the browser, bot or LinkScanner makes a request to the website then entry is recorded in the log files. There is no way for an average webmaster to eliminate that entry. Then Stats software parses the log files and reports to the Stats User that there was a request made to the website. Real visitor or a bot or a LinkScanner - itís a hit.

So, in order to differentiate between LinkScanner requests we need to install/write a software that would not rely on LogFiles at all or will take the HTTP Headers information into the effect to give the accurate Website Stats. That means we need to spend Money.

In order for the webmaster to know whether his/her site validates against the AVGs software, one must download and install the dumb thing. That means that webmasters system must be exposed to AVGs software. I donít know if this could called "Marketing", but I would slap the label of !@$% on it.

dstiles




msg:3688700
 6:40 pm on Jul 2, 2008 (gmt 0)

There is no way I can ever install AVG (or most other AV software) here since I use server/workstations not simple workstations. Most AVs now will not allow installation on servers without paying quite a lot of money! (Yes, I know about Clam and a couple of other freebies but they seem to have problems.)

So I have to rely upon other people to test AVG against my traps for me. Difficult since I've been telling them to uninstall it...

Samizdata




msg:3688732
 7:12 pm on Jul 2, 2008 (gmt 0)

Quick Poll: Which of these sums up AVG Technologies best?

Dishonest
Their whole strategy is built on lying to gain access to our servers.

Hypocritical
They say they want to help webmasters then immediately do the opposite.

Incompetent
The recent "security fix" proves (again) that they have no idea what they are doing.

Malicious
They say they want to break our eggs and they distribute free malware to do it.

All Of The Above
This option is included because it would get my vote.

....

[edited by: Samizdata at 7:14 pm (utc) on July 2, 2008]

incrediBILL




msg:3688743
 7:23 pm on Jul 2, 2008 (gmt 0)

Malicious

I don't think there was any intent to be malicious here, it was just the law of unintended consequences from being incompetent.

Samizdata




msg:3688786
 8:08 pm on Jul 2, 2008 (gmt 0)

I don't think there was any intent to be malicious here

I would have agreed with you until a few days ago.

There have been four articles in The Register and countless posts on WebmasterWorld - as Jim said "We've raised the flag, we've rung the bell, we've shouted the alarum" - and several members (including me) have contacted AVG directly to show them the error of their ways.

The response from AVG was to try harder to deceive webmasters.

They may have been incompetent about it, but deception remains their intent.

I do not consider that anything but hostile.

...

This 219 message thread spans 8 pages: < < 219 ( 1 2 3 4 [5] 6 7 8 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved