Forum Moderators: open

Message Too Old, No Replies

Trend Micro AV May Be Causing Excess Traffic

         

blend27

1:09 pm on May 14, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




System: The following message was cut out of thread at: http://www.webmasterworld.com/search_engine_spiders/3615360.htm [webmasterworld.com] by incredibill - 9:34 pm on May 15, 2008 (PST -8)


--- Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ---

I've seen workstations with "Trend Micro" AV to produce thouse. One of my friends had the thingy installed and while visiting one of my sites got banned for requesting too many pages, same pages at once with that UA.

But then again, this is the most popular SCRAPER Used UA that is out there after Java, Nutch and libwww-perl...

I will install the trial version and see if it is similar.

Blend27

incrediBILL

10:26 pm on May 15, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



OK, so am I correct in the consensus is that "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" isn't related to AVG as Blend27 mentioned "Trend Micro" AV?

The user agents with "SV1" and ";1813" are making a combined request for roughly 3K pages a day at the moment which is outrageous IMO.

[edited by: incrediBILL at 5:39 am (utc) on May 16, 2008]

Samizdata

11:53 pm on May 15, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My data consistently says 1813 is AVG LinkScanner and SV1 is not related to AVG.

It does, however, seem to be something similar - and I assume that if one anti-virus vendor does search result pre-fetching then all others will probably follow (though perhaps not as ineptly).

One difference is that while 1813 is always (for me) part of the user-agent that started this thread, SV1 often appears in longer strings than the one you cite, and on these occasions seems to be from actual human visits.

For example, I had this (and others) from human visitors today:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; InfoPath.1)

And later on this beauty turned up:

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 1.1.4322; .NET CLR 2.0.50727; SpamBlockerUtility 4.8.4)

So I Googled "SpamBlockerUtility" (on a Mac) and didn't like what I saw.

It seemed to be trying to download something automatically...

[edited by: incrediBILL at 5:42 am (utc) on May 16, 2008]
[edit reason] splicing new thread [/edit]

jdMorgan

3:21 am on May 16, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Note the double Mozilla-compatible in that UA string. And it says it's both MSIE 7.0 and MSIE 6.0. I sure would like to know what is up with that!

[added] In my logs, I see that "double UA string" associated with "SV1" but not with AVG or SpamBlocker. I posted this despite the possibility that it might be off-topic, because, well, I'm not yet sure if it is off-topic... [/added]

[added more]
OK, now "SV1" gets *really* interesting:
66.249.84.** - - [15/May/2008:20:08:44 -0700] "GET / HTTP/1.1" 200 31354 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
That looks like someone at the 'plex to me...
[/added]

Jim

[edited by: jdMorgan at 4:03 am (utc) on May 16, 2008]

[edited by: incrediBILL at 5:46 am (utc) on May 16, 2008]
[edit reason] splicing new thread [/edit]

smallcompany

4:18 am on May 16, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



One difference is that while 1813 is always (for me) part of the user-agent that started this thread, SV1 often appears in longer strings than the one you cite, and on these occasions seems to be from actual human visits.

I talk specifically about

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)

and nothing else. SV1 stands for some higher IE security and may be a part of regular (human) UA visit.

But this one is as same as 1813. Only two of them have trouble with special characters in my AdWords links, no other UA.

Finally, I’ll use Trend Micro Internet Security Pro and see if my IP shows with that UA in logs. Pro comes with that “extra” anti-phishing protection.

[added]

OK, now "SV1" gets *really* interesting

In regards of SV1, outside of this particular one that looks like 1813.

From MSDN:

SV1 - Internet Explorer 6 with enhanced security features (Windows XP SP2 and Windows Server 2003 only).

Samizdata

12:27 pm on May 16, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I do see indications that Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) may be pre-fetching search results, but unless it is causing a problem (e.g. excessive bandwidth) I would be cautious when dealing with it for the same reasons as AVG LinkScanner.

My nightmare is that pre-scanning search results with dummy UAs is the new norm on Windows, and that as webmasters we now have to learn about how each anti-virus vendor does it - failing which we risk being greylisted or otherwise flagged as potentially unsafe.

wilderness

2:54 pm on May 16, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My nightmare is that pre-scanning search results with dummy UAs is the new norm on Windows, and that as webmasters we now have to learn about how each anti-virus vendor does it - failing which we risk being greylisted or otherwise flagged as potentially unsafe.

Perhaps this may be a worry for a NEW website!
For established websites, with visitiors aware of their content and established procedures, these things present nothing more than a UA similar to harvesters.

We few here discussing rogue bots are hardly gathered together within a "concept" of policy that is acceptable to "everybody", much less seeing our collective action presenting an effective action which would assure uniformity in UA's and bot procedures.

incrediBILL

5:31 pm on May 16, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Don, I don't think it's an issue of site security policy.

IMO it's 2 issues:

1. Exposing their customers to cloaked malicious sites now that we know who they are and,

2. The practice of pre-screening and pre-fetching pages is abusive and borders on a DDoS as the volume of products with this feature increase.

Both issues need to be addressed with the people writing the software causing those problems.

wilderness

5:48 pm on May 16, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Bill?

We few here discussing rogue bots are hardly gathered together within a "concept" of policy that is acceptable to "everybody",

Rest my case!

From my own perspective!
I see no difference in lack of accepted protocl between these UA's and harvsters.

Course I'm a radical ;)

Don

Samizdata

6:30 pm on May 16, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Perhaps this may be a worry for a NEW website

My oldest website predates Google, but still gets new visitors all the time.

It gets far fewer of them when the users' anti-virus program doesn't declare it clean.

[edited by: Samizdata at 6:37 pm (utc) on May 16, 2008]

blend27

12:49 am on May 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@Don-- Course I'm a radical ;) --

Not more radical than me, I was mentioned as a MFA(ye, what happend to thouse?!) Slayer here a few Times ;)

incrediBILL

7:58 am on May 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Project Honey Pot claims this UA is used by 7.7% of all spam harvesting bots.

Is it possible that's what we're seeing here?

[edited by: incrediBILL at 5:38 pm (utc) on May 17, 2008]

dstiles

5:41 pm on May 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



IncrediBill - with the accept missing it's only a single or occasionally a double page access (at least, with that UA). There isn't the multiple-page access I would expect from a scraper.

On the other hand yes, that UA is otherwise a significant scraper and I would love to be able to trap it without trapping legit customers. I don't think the absence of other UA extensions can be taken as an indicator.

incrediBILL

6:22 pm on May 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There isn't the multiple-page access I would expect from a scraper.

Distributed scrapers only do one or a couple of pages per access. Just like the vulnerability probes, I tend to get single hits per IPs, but a lot of IPs are involved.

I'm just speculating here since botnets send spam it would make sense for them to harvest email addresses as well which could account for a larger number of IPs involved so it wouldn't get stopped by your typical bot blocker.

Samizdata

6:40 pm on May 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Project Honey Pot claims this UA is used by 7.7% of all spam harvesting bots

They also claim it is used by 30% of comment spammers (and who am I to argue?).

I would love to be able to trap it without trapping legit customers

Whatever it is, cloaking low-bandwidth content would seem to be safest.

I noticed one hit from this today that was immediately followed by an apparently human visitor from the same IP whose referrer entry was a Google search on my primary keywords - and a Google Desktop entry was duly added before they left.

It seems to walk like a duck round these parts.

Samizdata

8:26 pm on May 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Searching for evidence, I just installed the free TrendProtect plugin for IE, touted thus:

"To protect users, TrendProtect tags pages that have not been tested by Trend Micro, including pages that may be safe, as Suspicious."

They are not joking - I Googled two of my sites and both were prominently marked "Suspicious".

There was, however, NO HIT from any user-agent, so I looked at Trend FAQ:

"TrendProtect obtains rating information from rating servers."

[trendsecure.com ]

So which branch of the secret web police runs the "rating servers" I wonder?

This is a Trend that I for one find alarming.

Staffa

9:05 pm on May 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



TrendProtect tags pages that have not been tested by Trend Micro, including pages that may be safe, as Suspicious.

So, found guilty until proven innocent ?
On who's authority ?

I have Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) banned for a while now since it seems to be a favorite for the kind of scrapers that use a different Ip and a different UA per page fetched which seems like humans but the date stamp says different and no css nor images fetched.

incrediBILL

9:39 pm on May 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I would love to be able to trap it without trapping legit customers

It's easy to detect as it has invalid headers when it makes the request.

Legit browsers use proper headers.

Samizdata

10:48 pm on May 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Legit browsers use proper headers

So what do "rating servers" use?

And who can afford to block them?

As with AVG LinkScanner, the "SV1" user-agent appears to me to be pre-fetching results for searches conducted by real humans (at least in some cases), and if you block it that is naturally the last of your bandwidth they will use - because your site may well be flagged in their SERPs as "Suspicious".

As for associating it with Trend Micro, I would say the jury is still out - I have never blocked "SV1", but TrendProtect still has my sites blacklisted, and I don't know why or how to change it.

All my 403s will need to be re-examined under this "New Order".

[edited by: Samizdata at 10:49 pm (utc) on May 17, 2008]

incrediBILL

2:14 am on May 18, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



And who can afford to block them?

You don't have to block them, you can just cloak to them using rewrite rules and give them a much smaller page like jdMorgan showed in the AVG thread that's minimized to conserve bandwidth and will always give a "clean" page so no false positives.

Samizdata

2:22 am on May 18, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You can only cloak them if they identify themselves.

My tests with TrendProtect eventually produced a visit from this user-agent:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

It took my index page (status 200) but no CSS, javascript or images.

It came from an IP registered to Trend Micro.

They still libel the site as "suspicious" though.

incrediBILL

5:53 am on May 18, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks to info from Samizdata it looks like the Trend Micro IP range is:

OrgName: TREND MICRO INCORPORATED
NetRange: 66.180.80.0 - 66.180.95.255

If you are blocking this range, you could be sending visitors away.

jecasc

11:30 am on May 20, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One of my websites is hit pretty bad by this two User Agents:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)

4,935,377 hits in May alone causing several GB of traffic.

Most of the requests are for two non existent javascript files. The URL that is requested is really weired. They alone have been requested 2,489,626 times. Over and over again from different IP adresses causing 2 1/2 million 404 errors.

How does this AVG toolbar work anyway? I have blocked the User Agent for the time beeing. The IP is changing every two or three hours. Do the requests come from the toolbars or from a AVG server? I checked the IPs there were some German Telekom IPs, some from Austria.

Samizdata

12:41 pm on May 20, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



How does this AVG toolbar work anyway?

To clarify, the UA Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813) is AVG LinkScanner (not the optional toolbar), and it checks results of any searches done on Google/Yahoo/MSN by pre-fetching the listed page and (in some cases) any external JavaScript files.

I have blocked the User Agent for the time beeing.

I would say that is unwise, as your listing in the SERPs will be flagged in such a way that users will be discouraged from clicking it - better to cloak minimal content as in the example given in the AVG thread at [webmasterworld.com ]

Do the requests come from the toolbars or from a AVG server?

The requests come via the browser of real humans who are searching on your keywords.

The UA Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) remains unidentified but as noted elsewhere acts in exactly the same way as AVG and is likely to be some other anti-virus package exceeding its capabilities.

It does not appear to be related to Trend Micro.

jecasc

1:10 pm on May 20, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The requests come via the browser of real humans who are searching on your keywords.

This seems very unlikely in my case. 4 Million hits by some antivirus checker be it Trend Micro or AVG? In two weeks? Requesting the same two non existing javascript files over several days? This looks more like some spider caught in a strange loop.

Samizdata

2:15 pm on May 20, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This seems very unlikely in my case

The behaviour is easily replicated, but you must draw your own conclusions.

Download and install AVG Free 8.0 (with or without the toolbar), restart and use Google/Yahoo/MSN to search on keywords your site ranks for - but do not click the link in the SERPs.

There should be at least one entry from "1813" in your logs from your IP address.

This is easier to test on a site with little or no traffic (which you may not have now but which you might end up with if you block these useless excuses for tools).

smallcompany

7:25 pm on May 27, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I was troubleshooting my Perl script that was causing 404s and found that my own UA right now is:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)

I have no AV installed at all (my PC is so happy ;)).

OS is XP Pro/SP3 with IE 6.0, Office 2003, all latest updates applied.

wilderness

1:47 pm on May 31, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



First the AV cache and then the AOL cache.
Amnusing ;)

209.239.21.zz - - [31/May/2008:02:33:13 +0100] "GET /MyFolder/MyPage.html HTTP/1.1" 301 242 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)"
64.12.117.205 - - [31/May/2008:02:33:17 +0100] "GET /SameFolder/SamePage.html HTTP/1.1" 200 64870 "MyWebsite" "Mozilla/4.0 (compatible; MSIE 7.0; AOL 9.0; Windows NT 5.1; FunWebProducts)"

blend27

10:59 am on Jun 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



--@smallcompany

-- OS is XP Pro/SP3 with IE 6.0, Office 2003, all latest updates applied. --

None of .NET Frameworks Installed?

--end @smallcompany

Lets think about it for a moment here.

When an overage user runs Windows Updates (or Auto Windows Update is enabled), doesn't Microsoft Automatically sends .NET Frameworks to be installed on the client machines with updaters? It's in my experience that is what going on. I might be totally on this one, but :

1. for this UA either the system can not have the latest updates(illegal copy of XP)
2. user has no Automatic Updates or it is disabled
3. user has decided not to install .NET Framework
4. UA is "Genetically Altered" by user or software installed on the users machine

The IE 7 Browser that I am Accesing WW with is:

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; .NET CLR 3.0.04506.648)

Where Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; was injected in the UA after installing Trend Micro Internet Security 2008. I have uninstaled it yesterday night but the UA remained the same :(

superclown2

7:40 pm on Jun 2, 2008 (gmt 0)



"The UA Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) remains unidentified but as noted elsewhere acts in exactly the same way as AVG and is likely to be some other anti-virus package exceeding its capabilities."

I installed the latest AVG free trial one one of my computers, searched for some terms that my site ranks highly for without clicking on any of the search results. Each time the pages that appeared in the serps came up in my logs, each time the UA was Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1. Therefore, yes, at least some of these spurious log entries we are getting are down to AVG.

This 31 message thread spans 2 pages: 31