Forum Moderators: open
12:05:02 /{directory}/{file}.js
12:05:02 /favicon.ico
12:05:02 /{directory}/{file}.html
12:05:35 /{directory}/{file}.js 11:04:32 /{directory}/{subdirectory}/{file}.css 20:25:22 /{directory2}/{subdirectory}/{file}.html
20:28:53 /favicon.ico
20:29:39 /favicon.ico 19:41:10 /{directory2}/{subdirectory2}/{file}.html
19:41:56 /favicon.ico
19:42:06 /favicon.ico 12:40:24 /piwik/piwik.php?action_name={buncha stuff edited out here, including reference to google.fr search}
12:40:27 /piwik/piwik.js
12:40:34 /favicon.ico
12:40:35 /{directory}/{file}.html 11:21:35 /piwik/piwik.php?action_name={letter for letter the EXACT SAME CONTENT as above} The Trend bots, being part of an antivirus program's machinery -- rather than a web crawler -- are not going to read robots.txt or obey its instructions. Any URL that their user is able to visit is fair game for them to fetch and analyze for malware. On the other hand, they're not crawling it for the purpose of putting it into a public web index.
The moral of the story for a webmaster is that if you have pages or scripts that you want to be truly secret and inaccessible, you must make them physically impossible to access. Either put them in a password protected directory or apply an IP test so that the request is denied for all IPs other than yours.
It seemed that the software would send a URL that I typed into my browser address bar back to Trend even before my browser could fetch the page. I used to see requests from the Trend IPs (150.70. is one set of them) showing up in my logs even before my own request (if I recall correctly), and definitely at other times just a second or two afterwards, even for files that were secret, only on my server for a few seconds, and that I only requested once and then deleted.
The Regular Expression Is Your Friend
But what kind of security are you providing when you don't even look at a page until more than an hour after the human visit that triggered your inspection? Anything can happen in an hour.
I only remember them fetching main files like .htm, .html, .zip, and not .css, .js, but if they are fetching those others now, it would be a reasonable escalation because malware is being stored in those nowadays, too.
Like any AV, it inspects the file after the user's browser has received it and written it to hard disk in the browser cache. That's normal AV activity.
If you try to go to one, it blocks your request even before you can get a Google/Firefox Safe Browsing message or Internet Explorer warning.
Though I doubt they'd be able to explain the lack of a transparent UA name like "AntiVirusBot" so you know what you're dealing with.
If the site is not already in the dangerous list (and while your http request is still blocked), the URL is sent to the Trend server for a second check. The Trend bot fetches the URL and scans the result for malware. If the data is clean, Trend sends an "Ok" message back to the user's AV program. And the AV allows the browser to proceed with sending out its http request for the URL.
Intercepting a "communication" is in any case illegal but probably supportable IF virus implantation can be prevented thereby.
It's when their only visit is anywhere from two minutes to an hour and a half after the human visit* that I'm scratching my head.
Wouldn't they have to live in your router to do all that pre-testing while being perfectly invisible in the logs?
MSIE 6.0 is deprecated by MS so any serious use of it should, I think, be blocked.
To be effective, viruses should be checked for on the "user's" computer at download time
It's probably analogous to giving someone permission to open your mail
In your website log, you should see a request from the Trend bot and also a request from the person who's using a Trend AV product.
Staffa - I'm no longer sure about it being Japanese IP range. The 150 range is an "early assign" block of IPs with assignments to all the then major districts (RIPE, ARIN etc).