homepage Welcome to WebmasterWorld Guest from 54.167.75.155
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
150.70. revisited
lucy24




msg:4363888
 11:30 pm on Sep 17, 2011 (gmt 0)

The last discussion I found about these guys, either by IP or by name (TrendMicro/ trendnet), was year ago. Has anything changed?

So far I've seen about half a dozen specific IPs. They're all in 150.70.n.n --and nobody from 150.70 doesn't fit this pattern-- so let's leave it generic. UA is always

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

Apart from the routine roboticisms like picking up the same page twice, a minute or two apart, and a morbid fascination with the favicon-- sometimes without any accompanying page request-- there were a few that particularly caught my notice. Most of this is after the fact. My bad.

12:05:02 /{directory}/{file}.js
12:05:02 /favicon.ico
12:05:02 /{directory}/{file}.html
12:05:35 /{directory}/{file}.js


Better check that javascript file again; there's something hinky about it. Right.

11:04:32 /{directory}/{subdirectory}/{file}.css

I point this out because in one of those earlier threads, the poster said that their particular culprit didn't bother about css. This is the only time I found them picking up a css-- but to make up for it, they didn't pick up anything else in the area. Far as I know, they've never even been in that particular subdirectory.

20:25:22 /{directory2}/{subdirectory}/{file}.html
20:28:53 /favicon.ico
20:29:39 /favicon.ico


See above about morbid fascination. But fix that {directory2} in your minds.

19:41:10 /{directory2}/{subdirectory2}/{file}.html
19:41:56 /favicon.ico
19:42:06 /favicon.ico


These two visits (on different days) have one thing in common: not only is directory2 roboted out, they have no way of knowing that its subdirectories even exist. There's no index, either explicit (index.html) or automated. In each case, I had recently posted the link in a special-interest forum.

After verifying that the forum is roboted-out (yes, some visits to robots.txt are from humans :)) I went and had a chat with the site administrator. He did some snooping of his own and found that our friends at 150.70. have also been doing assorted "cold" downloads of files that you're not even supposed to know about unless you're logged in. (You can do the download if you paste in the address directly-- but only if you know the address. It's not something you could randomly make up without gleaning an enormous lot of 404s.)

I've saved the most interesting for last. I recently (re)installed piwik. This soon led to:

12:40:24 /piwik/piwik.php?action_name={buncha stuff edited out here, including reference to google.fr search}
12:40:27 /piwik/piwik.js
12:40:34 /favicon.ico
12:40:35 /{directory}/{file}.html


Note the time. After poring over* the whole day's logs I found a normal human visit from a completely different IP and UA which included:

11:21:35 /piwik/piwik.php?action_name={letter for letter the EXACT SAME CONTENT as above}

Note the time. Aren't these guys supposed to be checking for viruses? "Dear user: that site you visited an hour and a half ago was infected. Take immediate action or it may be too late."


* Brazen lie. I just used the text editor's Find function with appropriate RegEx.

 

SteveWh




msg:4363926
 4:09 am on Sep 18, 2011 (gmt 0)

I used Trend Micro Internet Security for about 5 years.

It seemed that the software would send a URL that I typed into my browser address bar back to Trend even before my browser could fetch the page. I used to see requests from the Trend IPs (150.70. is one set of them) showing up in my logs even before my own request (if I recall correctly), and definitely at other times just a second or two afterwards, even for files that were secret, only on my server for a few seconds, and that I only requested once and then deleted.

The software also maintained a list on my hard drive of all the URLs I visited (whether by typing into the address bar or by clicking links). That list was sent to Trend with each update. In that case, they'd crawl those URLs sometime later in the day.

The Trend bots, being part of an antivirus program's machinery -- rather than a web crawler -- are not going to read robots.txt or obey its instructions. Any URL that their user is able to visit is fair game for them to fetch and analyze for malware. On the other hand, they're not crawling it for the purpose of putting it into a public web index.

The moral of the story for a webmaster is that if you have pages or scripts that you want to be truly secret and inaccessible, you must make them physically impossible to access. Either put them in a password protected directory or apply an IP test so that the request is denied for all IPs other than yours.

wilderness




msg:4363930
 4:47 am on Sep 18, 2011 (gmt 0)

The Trend bots, being part of an antivirus program's machinery -- rather than a web crawler -- are not going to read robots.txt or obey its instructions. Any URL that their user is able to visit is fair game for them to fetch and analyze for malware. On the other hand, they're not crawling it for the purpose of putting it into a public web index.

The moral of the story for a webmaster is that if you have pages or scripts that you want to be truly secret and inaccessible, you must make them physically impossible to access. Either put them in a password protected directory or apply an IP test so that the request is denied for all IPs other than yours.


The real "moral of the story" is that webmasters must choose very carefully about what Third party IP ranges, Third party bots and software's they allow access.

Third party visitors (to be cordial) are not restricted to AV software's alone, rather include filtering software's, translators, universities and a variety of other capabilities, all of which feel that any website is fair game.

lucy24




msg:4363938
 5:44 am on Sep 18, 2011 (gmt 0)

It seemed that the software would send a URL that I typed into my browser address bar back to Trend even before my browser could fetch the page. I used to see requests from the Trend IPs (150.70. is one set of them) showing up in my logs even before my own request (if I recall correctly), and definitely at other times just a second or two afterwards, even for files that were secret, only on my server for a few seconds, and that I only requested once and then deleted.

Some of it has to be attributed to the logging software. Sometimes my server seems to be confused about what exact second something happened, and will hiccup back and forth like a time machine overdue for maintenance.

I found an even more illuminating one on a different date. The human visitor reloaded the page after 10 seconds or so-- and these page loads must have been separately reported, because an hour or so later 150.70 also visited the page twice, each time handing over every last letter of the query string. The only difference is that the human request for piwik.php came with a referer, while the robotic followup didn't.

Further scrutiny* reveals that the isolated request for a stylesheet was preceded by the real page-- several hours earlier. This in turn came only a few seconds after the triggering human visit-- unusually soon for them. Did some uber-robot look over the trip logs, jump up and scream at an underling to go get that stylesheet? (If so, the underling may have been fired shortly afterward. There are actually two stylesheets associated with this file. They only picked up one.**)

But what kind of security are you providing when you don't even look at a page until more than an hour after the human visit that triggered your inspection? Anything can happen in an hour.


* The Regular Expression Is Your Friend.

** ... and that one was only in use for a few weeks. Which means, I suppose, that it will take up space in my htaccess forever so robots can be redirected to the consolidated version. The one 150.70 didn't get, then or at any other time. Sigh.

SteveWh




msg:4363939
 5:49 am on Sep 18, 2011 (gmt 0)

Ok...

"They are going to consider any URL that their user is able to visit to be fair game for them to fetch and analyze for malware."

That's what I meant.

SteveWh




msg:4363943
 6:18 am on Sep 18, 2011 (gmt 0)

I only remember them fetching main files like .htm, .html, .zip, and not .css, .js, but if they are fetching those others now, it would be a reasonable escalation because malware is being stored in those nowadays, too.

The Regular Expression Is Your Friend

Absolutely!

But what kind of security are you providing when you don't even look at a page until more than an hour after the human visit that triggered your inspection? Anything can happen in an hour.


Like any AV, it inspects the file after the user's browser has received it and written it to hard disk in the browser cache. That's normal AV activity.

This pre-fetch behavior is more pro-active than that if it really did intercept the browser request and fetch the page *before* the user does. In other words, it's like saying, "Wait a minute, I'm not going to let you get that page until I've checked it out FIRST." Then the Trend bot fetches the page, examines it, and if it's ok, sends a message back to the user's AV, which allows their browser request to proceed.

I'm not sure it was doing that but did suspect it.

Also, Trend, like other AV companies, maintains its own list of unsafe websites. If you try to go to one, it blocks your request even before you can get a Google/Firefox Safe Browsing message or Internet Explorer warning. Some of this crawling could be in support of keeping their list updated.

lucy24




msg:4363956
 8:05 am on Sep 18, 2011 (gmt 0)

I only remember them fetching main files like .htm, .html, .zip, and not .css, .js, but if they are fetching those others now, it would be a reasonable escalation because malware is being stored in those nowadays, too.

I'm surprised they're not picking up the jpgs. Apparently you can stash anything in those. Had one recently that was fatally damaged, but the desktop preview was perfect-- because the preview data was intact. (This was, of course, a complete jpg, not a "web-ready" one.)

Like any AV, it inspects the file after the user's browser has received it and written it to hard disk in the browser cache. That's normal AV activity.

Well, "normal" is a statistical word. If you've already examined the cached copy, what else do you need? Heck, their own user could have planted malware on me in the course of their visit. And then the cached copy would be clean and the hours-later version would be dirty-- but I'm not their paying customer.

:: vague train of thought involving Public Health notifications ::

If you try to go to one, it blocks your request even before you can get a Google/Firefox Safe Browsing message or Internet Explorer warning.

That would be like, uhm, the caution my browser used to put up every time I logged on to Bing Webmaster Tools? (They must have changed something, because the browser has stopped worrying.)

Come to think of it, I might be able to identify the specific humans involved in a couple of these visits. Would be interesting to know how it works from the user's side. Though I doubt they'd be able to explain the lack of a transparent UA name like "AntiVirusBot" so you know what you're dealing with.

SteveWh




msg:4363969
 8:59 am on Sep 18, 2011 (gmt 0)

To clarify, the sequence of events that I hypothesized was:

The AV detects, on the PC, that an outgoing http request is about to be sent out (a page request). It blocks that request temporarily while the AV does 2 things:

1) While the request is blocked, it compares the URL against the list of dangerous sites. If it is known-dangerous, it leaves the request blocked and puts up a warning page in the browser instead, "This is a dangerous site." Your browser is never allowed to send the outgoing http request to the dangerous site.

2) If the site is not already in the dangerous list (and while your http request is still blocked), the URL is sent to the Trend server for a second check. The Trend bot fetches the URL and scans the result for malware. If the data is clean, Trend sends an "Ok" message back to the user's AV program. And the AV allows the browser to proceed with sending out its http request for the URL.

Then, when the page is received and stored on HD in the browser cache, it's scanned again (!) for viruses by the AV scanner.

Though I doubt they'd be able to explain the lack of a transparent UA name like "AntiVirusBot" so you know what you're dealing with.

I think they were just being sneaky, especially so that sites that distribute malware on purpose wouldn't know who their bots really are and block them. Revealing that "this is an AV scan" in the user-agent string would be an invitation to be blocked.

There are lots of these AV bots around. I don't think I've seen any of them identify themselves in the UA string, and for many, the IP doesn't even trace to the name of the company that is doing the scan.

lucy24




msg:4363982
 11:22 am on Sep 18, 2011 (gmt 0)

If the site is not already in the dangerous list (and while your http request is still blocked), the URL is sent to the Trend server for a second check. The Trend bot fetches the URL and scans the result for malware. If the data is clean, Trend sends an "Ok" message back to the user's AV program. And the AV allows the browser to proceed with sending out its http request for the URL.

Yes, that makes perfect sense. It's the behavior you would expect of a virus-sniffer. And, given the hiccupiness of the logs, arriving a second before or after the human visitor isn't significant.

It's when their only visit is anywhere from two minutes to an hour and a half after the human visit* that I'm scratching my head. Wouldn't they have to live in your router to do all that pre-testing while being perfectly invisible in the logs?


* The pages that caught my attention were temporary, private pages. You could literally count their human visitors on the fingers of one hand, so there's no doubt about which specific human triggered the virus snooping.

dstiles




msg:4364046
 8:19 pm on Sep 18, 2011 (gmt 0)

I completely missed that IP range as being Trend. According to the checks I made locally all of 150.11 - 150.100 is APNIC. But then, I was querying via ARIN, which is rubbish nowadays. :(

Checking on the number of "offensive" hits from this IP range I found about 20 recent IPs, all blocked for bad UA etc, several of them several times.

MSIE 6.0 is deprecated by MS so any serious use of it should, I think, be blocked. I log a security warning if it's used and kill any IP using it that looks iffy.

Checking for viruses after the event seems to be quite common. To be effective, viruses should be checked for on the "user's" computer at download time, not a few minutes or even hours later. Intercepting a "communication" is in any case illegal but probably supportable IF virus implantation can be prevented thereby. Talktalk is another offender in this and their checker is also blocked here (it's thought by some to be a precursor of a future advert pusher).

In general I block TrendMicro, although I do have one range listed as a static business one.

lucy24




msg:4364049
 9:01 pm on Sep 18, 2011 (gmt 0)

Intercepting a "communication" is in any case illegal but probably supportable IF virus implantation can be prevented thereby.

Heh. Interesting point. Here it's probably analogous to giving someone permission to open your mail: make it explicit, unambiguous and in writing so the client can't later say "Oi! Nobody told me they were going to snoop into every site I visit." Or is it analogous to simply looking at the sender's name on the envelope without opening it? Hmm.

SteveWh




msg:4364135
 5:51 am on Sep 19, 2011 (gmt 0)

I forgot a step in the outline above. The program also checks the URL against its list of website categories for parental filtering:

1.5) While the http request is still blocked, check the URL against the Parental Filter categories list. If the website belongs to a category blocked by the parental filter, put a warning message in the browser window instead of the requested page, and don't allow the http request to go out to that website.


It's when their only visit is anywhere from two minutes to an hour and a half after the human visit* that I'm scratching my head.


That could happen when the URL has been logged into the user's "URL History Log", a list that Trend doesn't receive until the next time the user's software auto-updates.

Wouldn't they have to live in your router to do all that pre-testing while being perfectly invisible in the logs?


It must have a means to detect and inspect the content of outgoing http requests before the operating system actually sends them out, which probably means installing a "hook" at the operating system level, and which many such AV programs probably do. It's the same sort of thing that a software firewall (as opposed to a hardware router) would need to be able to do.

Do you mean your router logs? I was using the Trend software firewall, which of course allows outgoing connections to the Trend server. If you have a hardware router that logs activity, that would certainly give info about how much (if any) of this behind-the-scenes traffic is really going on.

In your website log, you should see a request from the Trend bot and also a request from the person who's using a Trend AV product.

MSIE 6.0 is deprecated by MS so any serious use of it should, I think, be blocked.


IE6 is an old and vulnerable program. Malware embedded in websites often lies in wait for visitors who are using it, and attacks them (not attacking visitors who are using other browsers). When an AV scanner sends an MSIE 6.0 UA string, it is not using IE6. It is pretending to be a lame old weak browser. Provoking an attack by this method is a way to get shy/careful malware to reveal its presence -- when it attacks.

To be effective, viruses should be checked for on the "user's" computer at download time


As I said, it does that, too. The various AV companies have other strategies that they use in addition to that, in the attempt to gain a competitive advantage.

It's probably analogous to giving someone permission to open your mail


In years past, when the uploads of visited URLs first began being uploaded to the company server, there was no warning in the EULA about it, but nowadays there is.

----

If you block AV bots, it seems like that could prevent the AV companies from determining that your website is safe (but it seems very unlikely they would decide it's dangerous just because they're blocked), and it could prevent them from categorizing the site for parental filtering. That could potentially mean your site is sometimes blocked from access by users when, if the site were properly categorized, it wouldn't have been.

lucy24




msg:4364143
 6:47 am on Sep 19, 2011 (gmt 0)

In your website log, you should see a request from the Trend bot and also a request from the person who's using a Trend AV product.

That was my point. First comes the human. Then, anywhere from a few seconds to an hour and a half later, comes Trendmicro. With other nastiness-checkers I see them in tandem.

And I still haven't figured out the favicon. Why do they need two of them? Why do they need it at all? Surely a quick look at the HEAD would tell them that it hasn't changed since yesterday, and that it's the canonical 318 bytes rather than something suspiciously larger.

:: vague mental picture of entrance to robot headquarters lined with matched pairs of favicons ::

Staffa




msg:4364155
 7:28 am on Sep 19, 2011 (gmt 0)

I first saw 150.70. several months ago when it started "stalking" visitors to my sites.
Checked out the range and found it was from TrendMicro from a Japanese IP range and that was cause enough for me to block that range ;o)

dstiles




msg:4364515
 9:27 pm on Sep 19, 2011 (gmt 0)

SteveWh - my MSIE 6 comment was more to the point that the UA should in any case be blocked rather than it was an actual browser. Most things hitting web sites with an MSIE 6.0 UA are probably bots nowadays. otherwise, you have a valid point. :)

I think, based on that, any AV bot that was blocked from access should expect to be blocked. The bot is using a very dodgy UA coupled with non-browser header fields. Obviously it's carrying its own shroud with it. :)

Staffa - I'm no longer sure about it being Japanese IP range. The 150 range is an "early assign" block of IPs with assignments to all the then major districts (RIPE, ARIN etc).

Oddly enough, of the tools and WHOIS services I use, only robtex and RIPE give a correct assignment to TrendMicro. All others say it's part of the large Japanese range, which is why I had it down as such in my security database.

Mokita




msg:4365094
 3:21 am on Sep 21, 2011 (gmt 0)

dstiles wrote:
Staffa - I'm no longer sure about it being Japanese IP range. The 150 range is an "early assign" block of IPs with assignments to all the then major districts (RIPE, ARIN etc).


It is definitely in the Japanese range:

inetnum: 150.26.0.0 - 150.100.255.255
netname: JAPAN150
country: JP
descr: Japan Network Information Center
admin-c: JNIC1-AP
tech-c: JNIC1-AP
status: ALLOCATED PORTABLE
notify: hostmaster@nic.ad.jp
mnt-by: MAINT-JPNIC
changed: hm-changed@apnic.net 20070824
source: APNIC

What makes you think just because it is TrendMicro, that they are not operating out of Japan?

dstiles




msg:4365520
 7:00 pm on Sep 21, 2011 (gmt 0)

The IP range is quoted as belonging to and asvertised by trend in USA but looking further you seem to be correct. My observation was made on the fact that it's a US company and the 150/70 range is given by iana as being split into several "districts" including ARIN.

Whether or not it's being "hosted" in Japan, it's operated, I think, from USA.

Mokita




msg:4365598
 9:37 pm on Sep 21, 2011 (gmt 0)

it's operated, I think, from USA.


Ah, I think you might be slightly in error about that.

The first sentence on their Wikipedia page says

Trend Micro Inc. (TYO: 4704) is a computer security company. It is headquartered in Tokyo, Japan ...

Staffa




msg:4365644
 11:07 pm on Sep 21, 2011 (gmt 0)

I had to do some digging and I think you are actually both right

DNS for a range of 150.70. numbers :
sjdc-wtp-gs-maya4.sdi.trendnet.org
sjdc-wtp-gs-maya2.sdi.trendnet.org
sjdc-wtp-gs-maya6.sdi.trendnet.org
sjdc-wtp-g3-maya9.sdi.trendnet.org
sjdc-wtp-gb-maya4.sdi.trendnet.org
iad1-wtp-gd-maya6.sdi.trendnet.org
sjdc-wtp-gb-maya6.sdi.trendnet.org
sjdc-wtp-g3-maya12.sdi.trendnet.org
sjdc-wtp-gb-maya3.sdi.trendnet.org
iad1-wtp-gd-maya9.sdi.trendnet.org
sjdc-wtp-gs-maya1.sdi.trendnet.org
sjdc-wtp-g3-maya4.sdi.trendnet.org
sjdc-wtp-g3-maya6.sdi.trendnet.org

150.70. is a Japanese range but
trendnet.org is registered by :
Domain Name:TRENDNET.ORG
Created On:22-Feb-2002 19:55:52 UTC
Registrant Organization:Trend Micro, Inc.
Registrant City:Cupertino, CA, USA

Mokita




msg:4365654
 11:35 pm on Sep 21, 2011 (gmt 0)

I do hope this isn't seen as a "right/wrong" issue. I thought it was more about sharing information, or explaining how one came to a conclusion.

I didn't imply that TrendMicro don't have a strong presence in USA. They do indeed. In fact it was formed in Los Angeles, then moved its HQ to Taipai, then Tokyo, whilst retaining its US operations.

I was actually puzzled as to why dstiles apparently dismissed the idea of 150.70. as being physically located in Japan.

If you look at the link below, the top two NS for trendnet.org (in the 150.70 range) are identified as Japan.

[robtex.com...]

Trendnet USA operates from 69.89.66.224/28
and TrendMicro from 174.143.62.200/29 and 66.180.80.0/20

lucy24




msg:4365674
 12:42 am on Sep 22, 2011 (gmt 0)

Could be worse. Somewhere along the line I met a robot that apparently couldn't make up its mind whether it lived in Singapore or Texas.

dstiles




msg:4366023
 8:20 pm on Sep 22, 2011 (gmt 0)

Mokita - thanks for the clarification. I didn't find the wiki reference but compared it with a previously noted US IP range on 216.104.0/19. I sit corrected. :)

Thanks also for the extra IP ranges, which I hadn't before logged.

Mokita




msg:4366049
 9:06 pm on Sep 22, 2011 (gmt 0)

You are most welcome.

Wiki entry here:
[en.wikipedia.org...]

dstiles




msg:4366056
 9:15 pm on Sep 22, 2011 (gmt 0)

Thanks.

I have to agree with the header, that it looks like a news release written by an interested party. :)

Pfui




msg:4367591
 2:53 am on Sep 27, 2011 (gmt 0)

Speaking of TrendMicro (which I've never paid much, if any, attention to before), this just in --

216.104.15.130 [projecthoneypot.org...]
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

robots.txt? NO

- Made a single-hit beeline to a .zip file that's only available to same-site referrers; didn't follow the redirect to the A-OK starting point.

- That Project Honey Pot link is interesting given the number of hits, and spam, and the same old-same old UA.

- For those of you keeping score at home (smiles), more numbers:

Trend Micro Inc., San Francisco/Cupertino
Range: 216.104.0.0 - 216.104.31.255
CIDR: 216.104.0.0/19

lucy24




msg:4367606
 3:31 am on Sep 27, 2011 (gmt 0)

Eeuw. All I'd personally met was 216.104.15.130-142 which rounds off to 128-143, so I compromised by blocking all of 15.

I particularly dislike this address because 216.102.nnn is my local library while 216.108.nnn is my favorite Third World government, so it's like having a crack house in the middle of the block.

btherl




msg:4380440
 10:49 pm on Oct 27, 2011 (gmt 0)

The traffic I got under 216.15 was from only 4 ips:
216.104.15.130
216.104.15.134
216.104.15.138
216.104.15.142

Don't ask me why they use every 4th address only.

And I have blocked 150.70 altogether.

wilderness




msg:4416991
 2:56 pm on Feb 13, 2012 (gmt 0)

How innovate!
Their using the same UA as was used 30-months ago:

216.104.15.138 - - [05/Sep/2009:02:05:36 +0100] "GET /MyFolder/MyPage.html HTTP/1.0" 200 20904 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

150.70.172.107
iad1-wtp-gd-maya7.sdi.trendnet.org - - [13/Feb/2012:13:57:09 +0000] "GET /MyFolder/MyPage.html HTTP/1.0" 403 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

Not sure why I had in place half-a-dozen CIDR's. (that's what happens when you don't include remarks)
(150.0.0.1 to 150.101.255.255)

150. (all) will suffice.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved