Forum Moderators: open
Judging by the request headers it is a pretty much standard robot - it only ever seems to request the root document and then leave.
One thing all the requests I've seen have in common is that they carry a "Via:" header which (according to the http documentation) is there if the request gets passed through a proxy.
At this point in time I'm wondering what the application actually is and who makes it...
It might be a proxy (they are non-open if true - I checked common proxy ports of 80+3128+8080) but if that were the case why would it only ask for one page?
It might be a script controlled version of the browser - but if that were the case why would so may standard request headers that appeared in IE5.5 by default be missing?
It might be some sort of heart-beat check, but if that were the case why would it be so geographically distributed? e.g. take a random handful of the requests so far;
209.139.184.90 = Verio, Inc. (US)
209.7.199.68 = Illinois State Board of Education (US)
208.13.156.12 = Poe and Brown Benefits (US)
216.102.208.237 = City of Redwood (US)
193.133.109.17 = Frontline Distribution Limited (UK)
151.200.174.166 = Bell Atlantic (US)
67.98.187.23 = Novoste (US)
67.89.244.125 = Internet Allegiance, Inc. (US)
66.113.23.2 = Vanion, Inc. (US)
65.201.211.175 = InterOne Marketing (US)
63.236.133.235 = Qwest Communications (US)
63.144.41.70 = Nucor Building Sys (US)
Puzzled... Help - I'm out of ideas and would appreciate any other feedback any of the members are willing to offer!
Tony
p.s. here are some of the results I was seeing from the capture exercise;
Test example from IE6 :
---
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/msword, */*
Accept-Language: en-gb,en-us;q=0.7,en;q=0.3
Connection: Keep-Alive
Host: mysite.com
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Q312461)
Accept-Encoding: gzip, deflate
---
Example request headers from the Fetch API:
Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) Fetch API Request
---
Connection: Keep-Alive
Host: mysite.com
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) Fetch API Request
Via: 1.0 A-U4355WHP55XO9
---
Connection: Keep-Alive
Host: mysite.com
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) Fetch API Request
Via: 1.0 TAURUS
---
Connection: Keep-Alive
Host: mysite.com
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) Fetch API Request
Via: 1.0 PROXY
---
I've already had a conversation with Andreas (that link is his page) after spotting the same page with google yesterday prior to asking for help.
My conversation with him amounted to "we both know very little about this bot", there was some talk of it belonging to jaring.my but that was discounted after mooching through some logs and finding it was geographically distributed beyond Myanmar.
The links, although interesting, don't really tell you enough to form a solid opinion (someone put it in robots.txt & an article on what an API tool is).
So tempting just to 403 the bot...
Last night went through a search at Google without much success.
This search at google gorups; suggests Fetch API is "the Cache refreshing itself
[groups.google.com...]
If the above is true than the fetch API is not malicious at all. Unless you don't desire your pages cached by servers.
What we know now is that the term "Fetch API" (without the "Request" part) is *also* used in the context of Microsoft ISA proxy servers. But this doesn't explain the access patterns I'm seeing at all.
In my logs, there are two distinct patterns:
One is a single IP in china trying to download ALL my files.
The other are seemingly arbitrary IPs all around the world, fetching ONE file in intervals of a few days (always the same file, with one curious exception).
Neither behaviour is consistent with the typical tasks of a caching proxy server.
Hey bird,
Perhaps this explains why I'm not seeing it?
I have all of 203. denied. As well as portions of 202 and 210. Prior to doing so, because I have both Aussie and NZ visitors which are part of APNIC block, I went through by eastern country eliminating blocks I didn't desire traffic from. Although an occasional visitor slips through (last week somebody here enlightend me to a 50's ip; I also had a far east spider not identify itself recently from a 61. block. Grabbing quite a few pages in the process before I cut them off.
These steps are NOT for everybody. I have no market in the far east nor should the content of my site have benefit for them either.
Back to the drawing board. :-(