homepage Welcome to WebmasterWorld Guest from 54.205.205.47
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe and Support WebmasterWorld
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 173 message thread spans 6 pages: < < 173 ( 1 [2] 3 4 5 6 > >     
AVG Toolbar Glitch May Be Causing Visitor Loss
User Agent Flaw Suspected
Umbra




msg:3615362
 2:36 pm on Mar 31, 2008 (gmt 0)

Seeing a rash of hits with an oddly formed user agent:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)
No referer

mod_security always throws an error for this one. Hits come from various IPs with no consistent pattern, seem to be residential IPs. Any idea what it is?

 

Key_Master




msg:3646382
 3:36 am on May 10, 2008 (gmt 0)

Just because they use a component lib of IE if its using IE at all, doesn't automatically mean Microsoft approves of it.

[grisoft.com...]

Take a look at any of the Browser Capabilities (browscap) files and you'll note that all legit variations have a space before the next token. I run a site that gets over 600K visitors a month and have never logged a valid UA from a MSIE browser that didn't conform to that simple format.

I don't believe that for a second. Malformed agents from legit visitors are common. Looking through my logs, it didn't take me long to find a corrupted user-agent from a valid visitor. Here is a classic example of a Windows corrupted user-agent. In this case it appears the corruption was caused by software updates. Would your traps ban it?

"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; FunWebProducts; .NET CLR 1.1.4322; Zango 10.0.370.0; Parker Online; Parker Online; MSN 9.0;MSN 9.1; MSNbVZ02; MSNmen-us; MSNcOTH)"

Sorry to disagree but a toolbar performing pre-fetch is by no stretch a real human, it's an automated tool data mining in anticipation of a need by a real human, and not doing a very good job at it since it tripped everyone's bot traps and regex filters that were programmed using real world data.

It didn't trip mine. Certainly the user-agent can be useful in detecting unwelcome behavior, but it is only one element balanced among others in my security arrangement.

wilderness, I've gone way beyond that script. :)

[edited by: Key_Master at 3:39 am (utc) on May 10, 2008]

Ocean10000




msg:3646389
 3:58 am on May 10, 2008 (gmt 0)

"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; FunWebProducts; .NET CLR 1.1.4322; Zango 10.0.370.0; Parker Online; Parker Online; MSN 9.0;MSN 9.1; MSNbVZ02; MSNmen-us; MSNcOTH)"

Would pass the normal tests that I do at the User-Agent level, since "Windows NT 5.1; FunWebProducts;" has "; " after the OS information. Unless the quotes are part of the User-Agent too, which would be invalid.

And for "MSN 9.0;MSN 9.1;" I am guessing this is stored in a registry entry as a whole, not two separate entries to cause the behavior in this User-Agent.

wilderness




msg:3646393
 4:12 am on May 10, 2008 (gmt 0)

MSNbVZ02; MSNmen-us; MSNcOTH

I just went through fits (as Jim may testify) to allow (with
exceptions) this MSN UA and four others from longtime visitors that access the internet utilizing the MSN Online access browser, rather than their local machine browser.

This portion a longtime glitch and requires exceptions, which most are aware of.
MSN 9.0;MSN 9.1

There's an explantion on cause somewhere in the archives,

wilderness




msg:3646399
 4:16 am on May 10, 2008 (gmt 0)

wilderness, I've gone way beyond that script. happy!

As you should!
It's only six years old ;)

Key_Master




msg:3646401
 4:26 am on May 10, 2008 (gmt 0)

Would pass the normal tests that I do at the User-Agent level, since "Windows NT 5.1; FunWebProducts;" has "; " after the OS information.

Already changing the rules, eh? :) Actually, the corruption could happen anywhere in the user-agent.

Here's another unusual but legit user-agent I see a lot of:

"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0; MSN Optimized;US; MSN Optimized;US)"

Here's a corrupted agent that most of us have seen some variation of. Notice that it is two merged user-agents. I use to ban these but lifted the ban after some browser testing showed them to be legit:

"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 1.1.4322)"

I also have seen many examples of third party toolbar installations corrupting user-agents.

wilderness




msg:3646404
 4:45 am on May 10, 2008 (gmt 0)

MSN Optimized;US; MSN Optimized;US

This one of the four other exceptions I previously mentioned.
There are also varitions of this with a different country code.

Key_Master




msg:3646411
 5:00 am on May 10, 2008 (gmt 0)

This one of the four other exceptions I previously mentioned.
There are also varitions of this with a different country code.

So we are in agreement, MSIE user-agents can differ from the norm and still be legit.

That's the only point I'm trying to make. I don't believe that LinkScanner uses a spoofed or invalid user-agent and I think it does more harm than good to ban it. It's only going to become more popular with time with the support of M$ behind it.

And trust me, I have half the world banned. I wouldn't shy away from banning anything if I thought it was in the best interest of my sites.

incrediBILL




msg:3646417
 5:25 am on May 10, 2008 (gmt 0)

"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0; MSN Optimized;US; MSN Optimized;US)"

Actually, that would be a corrupt UA in anything less than MSIE 6 so if it shows up claiming to be MSIE 5 it would get punted. The validation rules have to know the rules of what's acceptable for each platform token as the supported features vary when parsing. When one of the new features in MSIE 6 & 7 included the MSN Optimized which has tokens per country such as "MSN Optimized;TR", "MSN Optimized;US", "MSN Optimized;GB", etc., it was added to my parsing rules for those versions.

Just a matter of staying on top of the situation. ;)

Anyway, we're getting way off path...

Regardless, the AVG toolbar UA appears to be failing common web security checks which is resulting in visitors not coming to sites. Not to mention even if the UA wasn't an issue the rest of the HTTP header doesn't appear match standard MSIE HTTP headers which trips another security check. Like I said earlier, the bad HTTP headers could be scrapers mimicing their toolbar as well, hard to tell until I get more data.

Additionally, the fact that this UA is very identifiable to their toolbar is a vulnerability, all of which need to be addressed IMO.

[edited by: incrediBILL at 5:26 am (utc) on May 10, 2008]

Key_Master




msg:3646444
 6:32 am on May 10, 2008 (gmt 0)

I've written a few toolbar detectors so I can agree it could be considered a vulnerability, albeit small, on AVG's end, but not for the site the toolbar is filtering. However, let assume future versions of the toolbar use the web browser's full HTTP headers. Wouldn't you be more inclined to ban a visitor that seemed to make two identical, simultaneous hits on each page it visited? At least by being identifiable you can make an educated decision on whether to ban it or not based on it's behavior and the information gathered from the HTTP headers.

Standards compliant scrapers would be very easy to build and I'm sure they're readily available on the Web. I don't really see why a scraper would need to mimic an AVG toolbar when any number of common, generic, run of the mill, user-agents would be more useful and less noticeable. Will it happen on occasion- probably. Just as every other popular user-agent has been exploited in the same fashion over the years.

Well until tomorrow, good night all! :)

incrediBILL




msg:3646455
 7:03 am on May 10, 2008 (gmt 0)

it could be considered a vulnerability, albeit small, on AVG's end

Knowing the visitor is using this specific toolbar is like putting a welcome mat at the door.

Now that the toolbar UA is known it's trivial to cloak a perfectly clean page to the toolbar and then display some phishing page or something loaded with malware to the visitor.

The point again being that the easily identifiable toolbar is completely vulnerable to the very types of things it's trying to protect surfers from encountering.

wruppert




msg:3646646
 1:09 pm on May 10, 2008 (gmt 0)

About 2% of my visitors had this string yesterday. They looked like normal visitors, no scraping nor exploit attempts.

Umbra




msg:3646722
 2:49 pm on May 10, 2008 (gmt 0)

Wouldn't you be more inclined to ban a visitor that seemed to make two identical, simultaneous hits on each page it visited?

Does the first hit include prefetch headers? Ideally, the AVG toolbar should send a prefetch header on the first request. This would be the civil thing to do, avoiding the controversy of Google Web Accelerator et al.

ColinG




msg:3646724
 2:55 pm on May 10, 2008 (gmt 0)

I checked my logs for yesterday grepping out this code and observed several other points not discussed yet:

1. Consistently a page would be requested and the home page also immediately requested within a second or so.

2. No referring pages were provided for the page or home page requests

3. The requests did not support gzipping and all of our pages are gzipped when the browser header has deflate or gzip requested.

4. There were multiple requests for the same page in a short time (12 seconds) and they did not send any header data so we could provide a proper 304 response.

5. For only one IP out of perhaps 80 were any images requested. On that request the requested page was requested 3 times, no user agent and no 304 to the additional requests. It also looks like multiple requests for the images, css and js files.

ColinG




msg:3646731
 3:23 pm on May 10, 2008 (gmt 0)

I went further and used my traveling portable which has AVG 8.0 installed. I did a search on Google for a web page I have and AVG checked all of the pages displayed in the Google search (except Google's page) to be sure there was no pfishing or other problems. A green star with a checkmark was shown next to the link.

In my logs I showed entries with the same user agent listed above with 4 requests - one for the page listed and 3 times for the home page listed in the Google results.

Again, no referrer, no gzipping.

When I clicked on the page in the Google search results, the expected log entries were shown, including gzip and proper referrers.

This will have the impact of making all server log analysis programs worthless and making the javascript driven web analytics the only usable tool. Perhaps if you grepped out these agent entries before processing the files you could recover and get proper data.

This explains why our visitor count using an older analysis program is up 15% month to month and the Google Analytics visitor counts are up just a few percent.

I used the back button to return to the Google results page, the green checked icon moved and the browser again requested the main page and the home page 3 times.

My guess is that the browser is requesting the target page and the home page indented in the results and then requesting the home page for the target page and the home page a second time.

This has major bandwidth considerations for both the user and the server. While the brower is downloading the page to be verified by the toolbar (using my local IP#) the page is not actually prefetched and is being requested a second time.

Apparently AVG is not seeing the bandwidth as it is by the user and AVG is only being notified of problem pages.

Interesting......

Key_Master




msg:3646736
 3:31 pm on May 10, 2008 (gmt 0)

Has anybody who has this toolbar installed tried to cloak safe content for this user-agent to see if the toolbar can be fooled. This might be a simple solution to the problem.

davelms




msg:3646747
 3:59 pm on May 10, 2008 (gmt 0)

Has anybody who has this toolbar installed tried to cloak safe content for this user-agent to see if the toolbar can be fooled. This might be a simple solution to the problem.

Given it was sucking images, css and javascript files too, I had a go, sending a very simple & small page with just a link to my homepage on it and no images, css or other content for it to download... and no problems so far. And the green tick continues in Google, etc, to be shown despite it getting cloaked content. Logs seem OK. Will monitor it.

davelms




msg:3646752
 4:12 pm on May 10, 2008 (gmt 0)

Oh... and I have also disabled the extension installed into Firefox, because Firefox has twice crash while in Google search since the installation... and I reckon this extension is the cause ;-)

Key_Master




msg:3646767
 4:23 pm on May 10, 2008 (gmt 0)

Thanks for checking this out davelms.

Try banning the agent and see if the grey question mark shows up. This is a control test to make sure the pages you are testing aren't already marked safe by the toolbar. If that works, try cloaking for the user-agent with a blank page to see if gets the green light.

If the cloaking works, we'll have a nice, easy to implement solution to the problems brought up in this discussion.

davelms




msg:3646775
 4:35 pm on May 10, 2008 (gmt 0)

Send a 403 = grey question mark.
Restart browser.
Cloak a blank page = grey question mark.
Restart browser.
Back to cloaking a small page with one link, no more than that = green tick.

Key_Master




msg:3646780
 4:46 pm on May 10, 2008 (gmt 0)

Interesting that a blank page would trigger the grey question mark. I guess it needs some content to compare to the search listing. Does the link have to match the listing from the serps?

Excellent work davelms. Thanks for that. What do you think incrediBILL and wilderness on this solution?

davelms




msg:3646786
 4:51 pm on May 10, 2008 (gmt 0)

I guess it needs some content to compare to the search listing.

Not had much chance to play, but the page I show has a different title (and obviously different text) to that displayed in the search listing. So I would guess that whatever check AVG is doing, it's not verifying the page it gets sent fully matches the bits of information it can derive from the search listing (otherwise my cloaked page with its dummy title and lack of content would have fallen at the first hurdle, I guess).

wilderness




msg:3646912
 7:34 pm on May 10, 2008 (gmt 0)

If the cloaking works, we'll have a nice, easy to implement solution to the problems brought up in this discussion.

Key_Master,
How will what your attempting change what took place here?

[webmasterworld.com...]

Samizdata




msg:3646924
 7:42 pm on May 10, 2008 (gmt 0)

Now that the whole world knows how to fool Grisoft's fantastic new "security toolbar", how long should it be before they change the user-agent to something less conspicuous?

By default AVG Free updates itself every 24 hours (and can be set to 4 hours) so the roll-out would seem easy enough to accomplish.

Would Monday morning seem reasonable?

jdMorgan




msg:3646931
 7:48 pm on May 10, 2008 (gmt 0)

That rather depends on whether anyone has found a way to contact them... :(

Jim

Receptional Andy




msg:3646933
 7:54 pm on May 10, 2008 (gmt 0)

change the user-agent to something less conspicuous

I'm hoping they change the UA to something more conspicuous, as I've feel they should let site owners decide whether to let this bot access their sites or not.

whatever check AVG is doing, it's not verifying the page it gets sent fully matches the bits of information it can derive from the search listing

I don't see that it could check something like that reliably, since search listings are not always constructed from the content of the page, and may be out of date in any case.

I confess I find this toolbar behaviour inexplicable. Why don't they just check the page while it's loading and block anything malicious then?

incrediBILL




msg:3646937
 8:14 pm on May 10, 2008 (gmt 0)

how long should it be before they change the user-agent to something less conspicuous?

It shouldn't be less conspicuous, it should be completely inconspicuous!

MS even shows the basics of how they build the UA with tokens from the registry so it's not that complicated:
[msdn.microsoft.com...]

Of course other toolbars are easy to detect as well, but knowing that a toolbar is present isn't as bad as knowing which toolbar is present...

Key_Master




msg:3646939
 8:21 pm on May 10, 2008 (gmt 0)

Key_Master,
How will what your attempting change what took place here?

I only see one hit from the user-agent in question. The toolbar uses the same IP is the browser so without a followup hit, I'd say the rest of the IPs you have listed are unrelated.

Now that the whole world knows how to fool Grisoft's fantastic new "security toolbar", how long should it be before they change the user-agent to something less conspicuous?

Who knows, it might actually be a useful tool if it works similar to "This site may harm your computer" warning Google labels some sites. There are a lot of people out there who aren't computer savvy and need whatever help they can get.

Changing the user-agent will not make much of a difference mainly because it uses the same IP as the web browser. If the user-agent was changed to mirror the browser user-agent it probably would be viewed with more suspicion. That looks more spammy than what is being discussed in this thread.

I do agree with Receptional Andy that keeping the user-agent conspicuous is a good thing.

incrediBILL




msg:3646944
 8:29 pm on May 10, 2008 (gmt 0)

I do agree with Receptional Andy that keeping the user-agent conspicuous is a good thing.

How can keeping it conspicuous be a good thing?

Maybe I'm the only one here that deals with the hackers and other malicious threats on a daily basis. I'm positive that if it's conspicuous, the malicious sites will cloak a squeaky clean page to thwart the toolbar and those " people out there who aren't computer savvy and need whatever help they can get" will get just the opposite.

Probably wound't take more than about 10 minutes to mock up a site with a virus embedded in a web page that cloaks to get a "GOOD SITE" signal from that toolbar.

It needs to be less conspicuous, not more.

[edited by: incrediBILL at 8:31 pm (utc) on May 10, 2008]

wilderness




msg:3646946
 8:35 pm on May 10, 2008 (gmt 0)

Key_Master,
How will what your attempting change what took place here?

I only see one hit from the user-agent in question. The toolbar uses the same IP is the browser so without a followup hit, I'd say the rest of the IPs you have listed are unrelated.

Those were all related.

There are many more lines (ommitted in my submission) and most all (with the excpetion of the last IP) were in succession across two different websites.

Each IP making the SAME IDENTIACAL requests.

Key_Master




msg:3646947
 8:39 pm on May 10, 2008 (gmt 0)

Maybe so wilderness, but that doesn't make it related to the toolbar. User-agents can and will be spoofed.

Receptional Andy




msg:3646950
 8:46 pm on May 10, 2008 (gmt 0)

I'm positive that if it's conspicuous, the malicious sites will cloak a squeaky clean page to thwart the toolbar and those " people out there who aren't computer savvy and need whatever help they can get" will get just the opposite

From my standpoint, this tool is essentially a commercial bot. They should declare themselves just like everybody else. Whether this causes problems with their (IMO misguided) security implementation for this toolbar is their own problem. If the program can detect malware based on scanning HTML, then it will be able to block it should a user clickthrough. There's no increased danger for the user.

Of course, it's highly likely that they will go for an inconspicuous UA. But then, I think this feature is about noise rather than security. I'd like them to give me a mechanism to stop them automatically requesting pages from sites I operate.

This 173 message thread spans 6 pages: < < 173 ( 1 [2] 3 4 5 6 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved