homepage Welcome to WebmasterWorld Guest from 54.234.2.88
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 173 message thread spans 6 pages: < < 173 ( 1 2 [3] 4 5 6 > >     
AVG Toolbar Glitch May Be Causing Visitor Loss
User Agent Flaw Suspected
Umbra




msg:3615362
 2:36 pm on Mar 31, 2008 (gmt 0)

Seeing a rash of hits with an oddly formed user agent:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)
No referer

mod_security always throws an error for this one. Hits come from various IPs with no consistent pattern, seem to be residential IPs. Any idea what it is?

 

Receptional Andy




msg:3646950
 8:46 pm on May 10, 2008 (gmt 0)

I'm positive that if it's conspicuous, the malicious sites will cloak a squeaky clean page to thwart the toolbar and those " people out there who aren't computer savvy and need whatever help they can get" will get just the opposite

From my standpoint, this tool is essentially a commercial bot. They should declare themselves just like everybody else. Whether this causes problems with their (IMO misguided) security implementation for this toolbar is their own problem. If the program can detect malware based on scanning HTML, then it will be able to block it should a user clickthrough. There's no increased danger for the user.

Of course, it's highly likely that they will go for an inconspicuous UA. But then, I think this feature is about noise rather than security. I'd like them to give me a mechanism to stop them automatically requesting pages from sites I operate.

incrediBILL




msg:3646956
 8:51 pm on May 10, 2008 (gmt 0)

Just curious what the toolbar is doing so I took a peek.

Today I had one IP using the ";1813" UA hit the same 2 pages 25 times is rapid succession so it's obvious it doesn't cache what it's doing. There was never a human hitting those pages, no js, CSS or images, just the 2 pages over and over.

Then another case of 5 identical page hits from ";1813" within a minute and never a human on the site.

This thing is a waste of bandwidth.

Then I found a real curious case where ";1813" actually loaded 15 images.

My hackles are up.

Ocean10000




msg:3646975
 9:20 pm on May 10, 2008 (gmt 0)

Wouldn't it be simpler for a security toolbar used to screen out problem sites to create a proxy, and have all the web browsers requests go though that proxy. And the toolbar can talk to the proxy to determine if it should put up a warning or not based of if they client actually trys to visit that site. Thus all the traffic would be legit, no UA problems or spoof attempts.

Instead of all this current mess which is causing everyone to get all bent out of shape including me in this thread.

incrediBILL




msg:3646996
 10:35 pm on May 10, 2008 (gmt 0)

requests go though that proxy

Actually, once the proxy is identified you can still cloak good pages to the proxy and serve up malicious pages.

I'm sure you remember that conversation we had about the screening service trying to keep corporate surfers off adult sites and such, once I figured out which proxy was their's I had the option to a) cloak false adult pages to the proxy server for a safe site to get the employee in trouble or b) cloak false clean pages for an unsafe site to let an employee use it without repercussion, the proxy cuts both ways.

Not only that, if you remember I was able to figure out most of the time which IPs were using both my site and that proxy service because they requested the same exact pages which are fairly unique on the site being monitored.

So a proxy isn't exactly going to solve the cloaking/spoofing issues but it does add a level of abstraction that will give a false sense of security and in theory allows caching at a minimum.

[edited by: incrediBILL at 6:45 pm (utc) on May 11, 2008]

Samizdata




msg:3647020
 11:46 pm on May 10, 2008 (gmt 0)

They should declare themselves just like everybody else

Effectively they do - MSIE 6.0; Windows NT 5.1;1813 might as well be AVG Toolbar; Cloak Me and I understand why some were sceptical that it came from a well-regarded security company.

Whatever happens, the toolbar will become very widespread in the next few weeks.

Can we expect similar from Symantec, MacAfee et al, or do we already have them?

Samizdata




msg:3647050
 1:49 am on May 11, 2008 (gmt 0)

it might actually be a useful tool if it works similar to "This site may harm your computer" warning Google labels some sites

As I understand it the difference here is the "grey" category - while Grisoft can claim that their assessment of "unknown... unable to read this page... may no longer exist or there may have been an error" is technically accurate, users naturally perceive it as a vote of no confidence from the people they rely on for security.

I had a hard time persuading one of my oldest friends to ignore that warning for my personal site.

There are a lot of people out there who aren't computer savvy and need whatever help they can get

Those are the ones I am worried about.

lazyhat




msg:3647056
 2:14 am on May 11, 2008 (gmt 0)


Post by Samizdata #3646002
I was until now a fan of AVG and installed it on many computers. The previous version is end-of-life and users are expected to upgrade to this new version in their millions within a few weeks, and a large proportion of them will not uncheck the "Install Security Toolbar" option.

I'm not a fan of any these companies, In fact I uninstalled Norton 2 years ago and never looked back. Either my computer is virus haven right now or just all illusion and those firewall/virus companies got alot of people feeding of lies. I really think its the latter.

jdMorgan




msg:3647069
 2:57 am on May 11, 2008 (gmt 0)

I've had one (very) malicious download attempt in the past five years, while checking out a site that had scraped one of mine. NOD32 AV fired up and blocked it, and paid for itself in time saved.

But this thread is not about whether AV companies trade in illusions and lies. It's about how an apparently-malformed user-agent is triggering server-security access-blocking routines, and therefore causing AVG's toolbar to put up a warning that will scare off visitors.

Secondarily, it's about the fact that since the user-agent is so easy to identify, cloaking a malicious site to look good to it will be trivial.

Thirdly, it's about how a security scanner that identifies itself in this way makes the user a target for any known "hole" in its vendor's anti-malware protection.

IMO, the AVG Linkscanner security toolbar should use the user's browser user-agent string, should cache anything it fetches to minimize wasted bandwidth from the sites that it checks and keep itself hidden, and should present entirely browser-like HTTP request headers; They just did not do their homework or think this implementation through very thoroughly.

Jim

[edited by: jdMorgan at 3:38 am (utc) on May 11, 2008]

Samizdata




msg:3647075
 3:40 am on May 11, 2008 (gmt 0)

Excellent summary Jim, here's another take:

AVG is changing users' search results in a visually subtle but psychologically dramatic way.

Anything Grisoft doesn't class as clean is by definition potentially dirty and a threat.

That means your site, if you don't play nice with their fake user-agent.

It's business, not personal.

jdMorgan




msg:3647087
 4:41 am on May 11, 2008 (gmt 0)

Anything Grisoft's Toolbar doesn't classify as clean can be perceived to be potentially dirty and a threat, because of the status message they present -- and must present.

> It's business, not personal.

Trying to keep this thread clear and business-like as well... :)

Jim

blend27




msg:3647247
 12:47 pm on May 11, 2008 (gmt 0)

I just instaled a AVG free version and it changed my UA to this:

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; .NET CLR 3.0.04506.648)

notice the space between 'SV1)' & ';'

from perfectly normal UA:

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; .NET CLR 3.0.04506.648)

Key_Master




msg:3647290
 1:51 pm on May 11, 2008 (gmt 0)

So will mod_security give blend27 a 403 because of his new ua?

jdMorgan




msg:3647302
 2:32 pm on May 11, 2008 (gmt 0)

It just might: Nested comments, semicolon following comment rather than a token, and duplicate (and contradictory) "Mozilla/4.0 (compatible; MSIE" strings.

It looks like AVG may be trying to use the browser's UA string, but clearly, their UA string parser and reconstructor is buggy.

I've been wondering where those multiple-"compatible" UA strings have been coming from... mystery solved - at least partly.

Jim

Key_Master




msg:3647327
 3:07 pm on May 11, 2008 (gmt 0)

It looks like AVG may be trying to use the browser's UA string, but clearly, their UA string parser and reconstructor is buggy.

Maybe, or the error could happen 1 out of 1000 installations and have more to do with the computer it's installed on.

jdMorgan




msg:3647334
 3:18 pm on May 11, 2008 (gmt 0)

Yes, it would be interesting to see the contents of the registry entry
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Internet Settings\User Agent\Post Platform
both before and after an AVG toolbar install.

Due to the number of these requests I'm seeing, I'd guess that their UA-builder code is buggy. Based only on "gut feel," I'd say this toolbar is enormously popular -- at least among my sites' visitors.

Jim

blend27




msg:3647421
 5:13 pm on May 11, 2008 (gmt 0)

I modded some code to see what headers they are using, and Now I've searched Google for my favorite Keyword phase where we rank as #1.

These are the headers were sent by this UA:

Cache-Control: no-cache
Host: www.mysite.com
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)
request_method: GET
server_protocol: HTTP/1.1

Now we know that that UA definitely belongs to AVG.

The interesting part that it(bot) only visits the pages that are listed on search engine, so once you get to the site, AVG is not scanning the links on that page. So what's the point? and Do the collect that data and if they do who is buying it and for what purpose?

And no IT IS NOT OK FOR AVG(Grisoft) to use my server resources and bandwidth, especially when user did not even get to my site/page.

403

Jim,

I am not sure what the registry said before but for now it is Blank.

Blend27

--added:

Key_Master,

I am not sure if it would trip the mod_security, not my environment, but it does get caught in my code as of a few minutes ago.

blend27




msg:3647450
 6:02 pm on May 11, 2008 (gmt 0)

In Fact, Now you don't need any special knowledge of codding! ,

Wanna do a minor DDOS on your competitor?..., install AVG, Go to Google, Set your preffs to see 100 results per page.

Then Do site:your-competitor.com, click on page 2, in 3 seconds, keep going, click Next!

blend27




msg:3647458
 6:27 pm on May 11, 2008 (gmt 0)

Here is what else is cool!

<cfif Find('NT 5.1;1813',cgi.HTTP_USER_AGENT)><html><head><title>Plain Page</title></head><body>YourDomain.com<br />Long Text</body></html><cfabort></cfif> using CF

Makes this AVG feature completely USELESS

This should be on CNETS Front Page.

smallcompany




msg:3647478
 7:05 pm on May 11, 2008 (gmt 0)

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)

And those are the only two user agents that in some cases (not all according to the logs) are causing 404s on my AdWords ads as they convert special characters like “?” and “=”.

Interesting thing in logs is that IP address of those user agents is not preceded nor followed by same IP. It is usually as a single entry with no referrer. In some cases I found same IP as a single entry being 2-3 minutes apart, again no referrer. Finally, there are some entries where there would be a bunch of GET requests, just like normal browser request would do it.

Single GET entries would appear in either 200 or 404 cases.

Just for today, I had over 50 404s from these UAs. That is roughly 50x$1 as they all come from Google AdWords.

Time to install AVG and see what happens when I click onto my ads…

P.S.
I sent them an email a few days ago, still no reply.

jdMorgan




msg:3647491
 7:21 pm on May 11, 2008 (gmt 0)

I tested on a machine with only AVG 8.0 installed -- The AVG Security Toolbar was NOT installed.

However, I still see the requests with ";1813" added to the user-agent string in my server log when I visit my site with this machine. So, this is not the AVG Security Toolbar per se, but rather the "Linkscanner" component of the AVG 8.0 AV program.

Interesting thing in logs is that IP address of those user agents is not preceded nor followed by same IP. It is usually as a single entry with no referrer. In some cases I found same IP as a single entry being 2-3 minutes apart, again no referrer. Finally, there are some entries where there would be a bunch of GET requests, just like normal browser request would do it.

Remember, this is the user's AVG Link Scanner prefetching the links on a Google, Yahoo, etc. search results page. If the user does not visit your site, all you'll see is a single request each time the search results page is loaded or reloaded. If the user *does* visit your site, then you'll see the normal page-fetching from that IP address.

Jim

incrediBILL




msg:3647494
 7:34 pm on May 11, 2008 (gmt 0)

That is roughly 50x$1 as they all come from Google AdWords.

Are you sure you're being charged for those clicks?

There has been speculation that the AVG Security Toolbar is doing pre-fetch and it appears they have a feature called "AVG Search-Shield" that claims "It checks the SEARCHED (using Yahoo or Google services right now) web pages content.".

From that language I'm assuming it checks the links on the SERPs and perhaps the top AdWords ad above the SERPs is being included or maybe all of the AdWords ads on the page are being included, hard to speculate.

If this is the case, this is truly a huge problem if AdWords advertisers are getting charged for AVG's toolbar pre-fetch their landing page.

Can someone confirm whether or not this toolbar pre-fetch is actually resulting in AdWords charges?

g1smd




msg:3647497
 8:06 pm on May 11, 2008 (gmt 0)

Is there any word from Grisoft about this?

Is their any discussion of this on their site?

incrediBILL




msg:3647499
 8:13 pm on May 11, 2008 (gmt 0)

UPDATE: (thanks Andy)

There's another thread about this in the AdWords forum and the consensus was it's probably not charging the advertiser.
[webmasterworld.com...]

Rehan




msg:3647500
 8:13 pm on May 11, 2008 (gmt 0)

Can someone confirm whether or not this toolbar pre-fetch is actually resulting in AdWords charges?

I commented on that point in this thread [webmasterworld.com]. When I monitored the traffic through Ethereal, I saw the AVG toolbar access my landing page directly without going through the AdWords URL. The landing page download by the toolbar did not result in an impression for the keyword or any charges to my AdWords account.

Samizdata




msg:3647545
 10:47 pm on May 11, 2008 (gmt 0)

I still see the requests with ";1813" added to the user-agent string

I just downloaded a fresh copy of AVG 8.0 and installed it on another Windows XP SP3 box.

As before, the user-agent for search result pre-fetches was the famous:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)

The user-agent for normal visits in IE7 was unchanged, no 1813 added.

This is with the "insecurity toolbar" installed.

Samizdata




msg:3647547
 10:50 pm on May 11, 2008 (gmt 0)

403

Imagine for one moment that you are my competitor.

Your site ranks number 1 for the primary keywords on all major search engines but you block the wonderful AVG toolbar user-agent and your listing is marked "not confirmed safe", so few AVG users are willing to click your link.

My site ranks number 2 for the primary keywords on all major search engines (grrr!) but I allow the wonderful AVG toolbar user-agent and my listing gets the Grisoft seal of approval, so all AVG users are willing to click my link.

All your SERPS are belong to us.

incrediBILL




msg:3647552
 10:55 pm on May 11, 2008 (gmt 0)

Just to show how long this thing has been around in testing or otherwise, I went to my bot blocker archives to see what kind of activity there's been and here's the results.

First sighting: 12/13/2006

2006 - 4 hits
2007 - 358 hits

2008
Jan - 32 hits
Feb - 18 hits
Mar - 243 hits
Apr - 1656 hits
May - 5533 hits (to date 5/11/08)

So you can see that there's a pretty good adoption rate since it's launch on Apr 24 '08.

smallcompany




msg:3647553
 10:58 pm on May 11, 2008 (gmt 0)

Should Apache block it if I do this, among my other blocks:

edit-begin
RewriteCond %{HTTP_USER_AGENT} 1813 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} heritrix [NC]
RewriteRule ^(.*)$ - [F,L]
edit-end

The reason I ask is that I just did this, but my site in Google results still gets the green passing checkmark.

I wonder what I’m missing here…

incrediBILL




msg:3647555
 11:10 pm on May 11, 2008 (gmt 0)

I hope you're only doing this to test the toolbar and not really intent on permanently blocking it since it would turn away visitors.

However, if you must block it, I would use the prefix ";" as well and check for ";1813" just so you don't get any false positives in those big long .NET data strings you often see in MSIE user agents.

blend27




msg:3647559
 11:36 pm on May 11, 2008 (gmt 0)

-- Imagine for one moment --

Actualy, 403 puts a BIG green check mark to the right of the listing in SERP!

[edited by: blend27 at 11:39 pm (utc) on May 11, 2008]

jdMorgan




msg:3647560
 11:37 pm on May 11, 2008 (gmt 0)

As suggested in several posts earlier in this thread, returning a very small valid html page to the AVG Linkscanner client is a much safer way to conserve bandwidth without risking traffic or revenue loss. In .htaccess:

RewriteCond %{HTTP_USER_AGENT} ;1813\)$
RewriteRule !^a-very-small-page\.html$ /a-very-small-page.html [L]

Jim

This 173 message thread spans 6 pages: < < 173 ( 1 2 [3] 4 5 6 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved