homepage Welcome to WebmasterWorld Guest from 54.197.147.90
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 173 message thread spans 6 pages: < < 173 ( 1 2 3 4 5 [6]     
AVG Toolbar Glitch May Be Causing Visitor Loss
User Agent Flaw Suspected
Umbra




msg:3615362
 2:36 pm on Mar 31, 2008 (gmt 0)

Seeing a rash of hits with an oddly formed user agent:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)
No referer

mod_security always throws an error for this one. Hits come from various IPs with no consistent pattern, seem to be residential IPs. Any idea what it is?

 

ArthurSixpence




msg:3681411
 1:25 pm on Jun 23, 2008 (gmt 0)

VERY IMPORTANT

Further testing proved very well worthwhile.

Contrary to my earlier suggestion and those of others posting in this forum, under NO circumstances should you use a redirect for the AVG Ua's unless you intend this to specifically point to a page to advise AVG users of the issues (unlikely in the case of commercial sites).

If you use a redirect this is the scenario.

The search engine results are being scanned in real time so any hits on your site from the linkscanner UA will indeed be redirected to your 'very-small-page' or indeed to AVG's own site if that's what you've decided. Unfortunately when the user clicks the search engine link the same UA is used and any traffic originating from an AVG user just ends up in la-la land rather than on your real page. You may save bandwidth but you've also just lost a customer.

Until AVG sorts this farce out, or a major player threatens them with legal action it looks like we're stuck with whatever this bunch of idiots throws at us.

Meanwhile I'll just keep harvesting the IP's as I'm sure McAfee etc could be interested. If you work for McAfee/Norton etc and would like to construct a CPM info page about your rival, I'm all ears and would love to send you some traffic as I suspect would many other users of WebmasterWorld.

Samizdata




msg:3681492
 2:28 pm on Jun 23, 2008 (gmt 0)

when the user clicks the search engine link the same UA is used and any traffic originating from an AVG user just ends up in la-la land rather than on your real page

Perhaps you can tell us which version of AVG you are using, which browser you are using, which method of dealing with LinkScanner you are using, and what testing you have done.

We know that AVG is attempting to fix LinkScanner so changes are inevitable.

...

ArthurSixpence




msg:3681503
 2:40 pm on Jun 23, 2008 (gmt 0)

Version: AVG 8.0

Browser: IE 6.0 on XP SP2 (browser purposely not upgraded to enable me to check sites with an older browser)

Method: htaccess redirects (now disabled) and bot traps at the moment although this will change as AVG changes

Testing: using the product as a user would but on sites that I control so all accesses can be monitored

Samizdata




msg:3681533
 3:12 pm on Jun 23, 2008 (gmt 0)

Sorry, I meant paid version or free version, but having now read one of your earlier posts I can see that you paid. It would also be helpful if you posted the user-agent string with any report and include information about any HEAD requests so that we can understand what you are talking about.

If the method you use no longer works it doesn't necessarily mean that other methods will fail, and given its risible history it seems unlikely that LinkScanner is about to become any less of a joke.

...

incrediBILL




msg:3681551
 3:38 pm on Jun 23, 2008 (gmt 0)

Unfortunately when the user clicks the search engine link the same UA is used and any traffic originating from an AVG user just ends up in la-la land rather than on your real page. You may save bandwidth but you've also just lost a customer.

That's not possible because the redirect is based on the LinkScanner user agent which never matches the actual browser user agent unless your redirect code has a flaw in it.

ArthurSixpence




msg:3681602
 4:23 pm on Jun 23, 2008 (gmt 0)

The redirect code was similar to that I posted previously:

RewriteCond %{HTTP_USER_AGENT} ;1813\)$ [OR]
RewriteCond %{HTTP_USER_AGENT} User\-Agent:\ Mozilla/4\.0\ \(compatible;\ MSIE\ 6\.0;\ Windows\ NT\ 5\.1;\ SV1\)$
RewriteRule !^xyz\.html$ /xyz.html [L]

The user agent that appeared in the logs for both the linkscanner hit from the SE listing AND for the click-through was the second malformed 'User-Agent:' version, and it was trapped by the redirect on both occasions. The search was performed using the AVG hijacked Yahoo which is installed by default.

Samizdata




msg:3681640
 4:53 pm on Jun 23, 2008 (gmt 0)

The advice for that method was to put a link to your homepage in the dummy file.

Did you do that, and if so what happened when you clicked it?

...

ArthurSixpence




msg:3681826
 8:09 pm on Jun 23, 2008 (gmt 0)

Yes, I did and of course it went to the homepage but that's rather pointless.

The link I'd clicked in Yahoo should have taken me to

[abc.abc...]

but because of the AVG UA the redirect grabbed it and actually took me to

[abc.abc...]

as it would any other AVG user who had clicked it. The fact that it had a link to my homepage would be absolutely meaningless to the user. If the redirect had sent it to [avg.co.uk...] as I originally suggested then AVG would have benefitted from my Yahoo listing!

Staffa




msg:3681880
 9:30 pm on Jun 23, 2008 (gmt 0)

I am not familiar with htaccess but I guess that there could be something not right with your code.

My log files show that 1813 arrives with no referer, then the visitor (same IP) arrives with his/her own UA and referer link.

1813 gets redirected to its own home,
visitor lands on the page linked from the SE

PS : I'm on a windows box

[edited by: Staffa at 9:57 pm (utc) on June 23, 2008]

Samizdata




msg:3681904
 9:54 pm on Jun 23, 2008 (gmt 0)

There has been a lot of confusion in this thread since it started on 31 March.

I just tested the latest release (21 June) of AVG against two of my sites.

In both cases it performed a HEAD request followed by a GET request with this user-agent:

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)

In both cases I got the green checkmark and star in the Google results.

In both cases clicking the link in the SERPs took me to the correct homepage of the site.

In both cases I had not changed anything in my .htaccess file.

In both cases LinkScanner was comprehensively fooled as usual.

My eggs remain intact.

LinkScanner remains a joke.

...

incrediBILL




msg:3681917
 10:08 pm on Jun 23, 2008 (gmt 0)

the click-through was the second malformed 'User-Agent:' version

Sorry, that's not a click-through, that's something else AVG is also doing.

incrediBILL




msg:3681923
 10:14 pm on Jun 23, 2008 (gmt 0)

In both cases it performed a HEAD request followed by a GET request with this user-agent:

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)

The flaw here obviously is that a browser doesn't include "User-Agent:" in the actual user agent but even if they fix that they still have other flaws in the request which I won't mention because I don't want them to fix those either so I can continue to detect this junk and deal with it appropriately.

I'm still flabbergasted that they think the new UA using a HEAD and GET is a "FIX" because now it's doubling the amount of hits so it's DDoS x2!

<mod hat off to sing>

...
But where are the clowns?
Quick, send in the clowns.
Don't bother, they're here.

<mod hat on>

[edited by: incrediBILL at 10:15 pm (utc) on June 23, 2008]

dstiles




msg:3681925
 10:17 pm on Jun 23, 2008 (gmt 0)

Re: dynamic IPs - my own customers seem to change IPs fairly regularly, some daily (I've tried to white-list some of them for my mail server and failed). Obviously it depends on the ISP but I know a lot do drop the IP when the connection is dropped. Some of my customers are on "economy" broadband and are still on simple (and vulnerable) modems, not routers.

Incredibill - I take your point about cookies.

And... if a legitimate or otherwise site isn't being deliberately accessed by the "visitor" (ie they didn't click on the google link) then any cookie set by the site on the browsing computer is presumably done without its user's knowledge and hence, according to UK law, is illegal (I know there are side issues in that one!). Presumably, though, AVG does not actually accept or set cookies?

Samizdata: thanks for the info on 403's. I suppose that's one way of discovering any new UAs from AVG, except AVG is likely to use a bog standard UA next time.

Sixpence: surely the actual link on the google page will be followed directly to the correct page with the correct browser UA - eg Firefox? So no traffic lost. Certainly I serve up a dummy page and that is not seen by an AVG visitor (re-tested this evening, AVG 8.1 db updated today, no "user-agent:" prefix nor HEAD detected).

Still unsure where the "user-agent:" prefix comes into all this. It's logging a similar UA to AVG (ie 1813 or SV1) so possibly whatever was distorting the UA occasionally before is now doing it to the AVG UA. Except that 1813 without prefix does not check HEAD so it must be doing something semi-automatic in its own right. If so, probably something that was released in the past week or so and installed after an AVG update. Except that one would have expected a similar quantity of hits before AND from varying UAs, and prior to last week the prefix has been quite scarce.

Since whatever it is accesses sites with old date-stamped page links I assume it's going through either browser-cached pages and re-checking them online or it's re-checking bookmarks.

Which still makes no sense because why did we not see them as soon as AVG came out? Or is AVG 8.1 only a very recent update?

Now read Samizdata's latest post (9:54pm) and noticed his AVG was 8.0 so that isn't the answer.

A thought from Sixpence's posting: could this prefix be generated by the yahoo toolbar in conjunction with AVG? How many people previously used the Yahoo toolbar instead of Google's and have now migrated / been usurped? Have they combined in a deadly inter-SE fight to the death using loaded AVGs? :)

Samizdata




msg:3681926
 10:22 pm on Jun 23, 2008 (gmt 0)

I'm still flabbergasted that they think the new UA using a HEAD and GET is a "FIX"

I can only stand so much hilarity in one 24 hour period so I haven't tested it yet but my guess is that the preceding HEAD request might be intended to stop people redirecting to AVG's site.

Which would mean that they don't care about security of their customers at all.

Just their own bandwidth bill.

..

incrediBILL




msg:3681933
 10:31 pm on Jun 23, 2008 (gmt 0)

I'll bet they added HEAD so they could cache the result and just check to see if it changes, which is why people use HEAD in the first place. However, this is a completely misguided attempt to fix the DDoS aspect as most of us with popular sites are getting hit by thousands of IPs a day, not just a few IPs asking for the same pages over and over.

The idiotic part is you're supposed to use HEAD to check to see if the file changed once you cache it, not BEFORE you cache it!

[w3.org...]

9.4 HEAD

...

The response to a HEAD request MAY be cacheable in the sense that the information contained in the response MAY be used to update a previously cached entity from that resource. If the new field values indicate that the cached entity differs from the current entity (as would be indicated by a change in Content-Length, Content-MD5, ETag or Last-Modified), then the cache MUST treat the cache entry as stale.

For instance, instead of 3K IPs asking for 3K pages it'll be 3K IPs asking for both the HEAD and GET so it's 2x the amount of request to process for the same flawed product.

So therefore, in an attempt to "fix" the problem it quickly becomes twice as bad, hence my singing a few posts back.

[edited by: incrediBILL at 10:36 pm (utc) on June 23, 2008]

Samizdata




msg:3681940
 10:46 pm on Jun 23, 2008 (gmt 0)

@ dstiles - A thoughtful post with a lot of questions in it.

All I can suggest is continued vigilance, testing and reporting.

The only conclusion we can currently draw is that AVG are desperately trying to salvage something out of the money they paid for LinkScanner and that we will probably see further inept tinkering.

Reasonably enough, any changes are likely to happen first in the paid version.

According to The Register, AVG claims 70 million users worldwide for its free and paid versions.

According to my logs only about 350 people on the planet actually use the paid one.

...

Receptional Andy




msg:3681946
 10:57 pm on Jun 23, 2008 (gmt 0)

I'm seeing pretty lunatic behaviour from "1813" at this point, but I'm finding it impossible to separate bad behaviour resulting as a direct result of AVG's linkscanner, from bad behaviour as a consequence of AVG's seemingly incompetent implementation of this "feature". It seems that bots are already masquerading as linkscanner. I'm also seeing apparently human requests from that UA.

mlduclos




msg:3683598
 12:33 am on Jun 26, 2008 (gmt 0)

I just discover that AVG isnt validating my website, no idea why, they flag me in Google search with red "X", meanwhile send a huge ghost traffic which is overloading the server...

Peter




msg:3684735
 12:34 am on Jun 27, 2008 (gmt 0)

209.169.14*.* - - [26/Jun/2008:00:14:15 +0200] "GET /some/thing.html HTTP/1.1" 200 57394 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"

I know there has been some doubt about whether this user agent is coming from AVG, and some doubt as to whether it can also be a real visitor. For my part, I certainly get real visitors with this UA, and I'm certainly getting spoof hits from something with this UA, which I imagine is AVG.

The AVG hits never have a referrer, of course; but the absence of a referrer is not enough to be sure it's not a real visitor. On the other hand, AVG is unable to handle compression, while real visitors with this UA all seem to take compressed pages.

Can anyone say if this version of MSIE should always (or almost always) handle compression, please? If so, the rest is easy.

Thank you.

Samizdata




msg:3684742
 12:49 am on Jun 27, 2008 (gmt 0)

The user-agent Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) is what the Exploit Prevention Labs version of LinkScanner used before AVG bought it, and a few days ago AVG started to use it in an attempt to stop webmasters and malware writers fooling LinkScanner.

The user-agent is - in rare cases - used by real human visitors, so I fool it using other methods.

It would be easy to post details here, and even easier for the "Bad Guys" to implement it.

All the information you need is already on WebmasterWorld.

...

incrediBILL




msg:3684823
 3:36 am on Jun 27, 2008 (gmt 0)

some doubt as to whether it can also be a real visitor

There is no doubt it's AVG because there are some other parameters in the HTTP request that you don't see in a log file that a normal browser doesn't make.

It's been mentioned a couple of time here so until they fix that problem it's still trivial to detect.

I would spell it out but I really don't want them to fix it, I want them to discard it! ;)

Demopoly




msg:3685590
 10:12 pm on Jun 27, 2008 (gmt 0)

I read all this and looked at ten other websites, then acted. I too have a growing dislike of AVG for many reasons, mostly bloatware. I have no intention of paying higher bandwidth costs just because AVG decides to be "flip." Malware is malware and should not be tolerated from anyone, especially not from security companies like AVG/Grisoft. I've blocked their new "toy" and redirect these and other baddies to my blocked.html page. Lost sales? I haven't heard any customer complaints and don't expect to.

I don't think that webmasters can live in fear of losing clicks. That's no way to make decisions. Run the site correctly, be professional, and to hades with AVG and their ilk.

~D

Samizdata




msg:3685621
 11:16 pm on Jun 27, 2008 (gmt 0)

Welcome to WebmasterWorld Demopoly.

While I agree about bloat AVG have been deservedly popular in the past.

"To Hades with LinkScanner and its ilk" might be better.

Finjan and the others are just as easy to fool.

...

This 173 message thread spans 6 pages: < < 173 ( 1 2 3 4 5 [6]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved