| 7:47 am on Jan 4, 2013 (gmt 0)|
For some reason I want to say that new instances I've seen appeared to be a cell phone in the past, but looking at my most recent log, they all seem to be coming mainly from the Ukraine, Turkey, Poland and France.
A few appear to be from the US with fake referrers like yahoo and bing but the pages they snagged are completely unrelated from each other and done too quickly.
i say still block them.
| 8:53 pm on Jan 15, 2013 (gmt 0)|
A significant percentage of NEW IP ranges I've recently blocked have been blocked on this UA. Although I've flagged a few as being "servers" that was often a subjective call and may have been erroneous (but from a "bad" neighbourhoods so entails prejudice). The majority are from broadband IPs.
I'm not sure if the hits are from compromised machines (ie part of a botnet) or are part of some browser add-on/plug-in. If the latter then the result for the user is abysmal - at least from my sites. I cannot say I've seen any real evidence of compromised machines.
Since it appears to be apache-based and relatively few users have apache installed on "home" machines, and few linux machines are susceptible to compromises anyway, I'm still inclined to think this is a deliberate use of a versatile tool, possibly (probably?) with the aim of site-scraping.
From a limited number of checks in actual site logs it seems that each hit is unique - there are no other hits for the offending IP and no further attempt to hit the site from that IP - at least, not on the same day.
| 9:55 pm on Jan 15, 2013 (gmt 0)|
It's not Apache Synapse.
|Ararat Synapse - Pascal TCP/IP Library for Dephi, C++Builder, Kylix and FreePascal. |
|The SYNAPSE library aims to create a complete library of classes and functions that would markedly simplify application programming of network communication using Winsock. |
FUserAgent := 'Mozilla/4.0 (compatible; Synapse)';
| 1:13 pm on Jan 16, 2013 (gmt 0)|
Over the past week, I've seen specific and repeated "attacks"/"scrapes" (whatever they are) by this UA. It works like this:
1) IP comes to site and hits the same page 10 times in < 5 seconds
2) A few hours later, a different IP in a different part of the world comes to the site and hits a different page 10 times in < 5 seconds
This happens again the next day with yet another set of IPs/pages. It's been repeated for a week at least recently.
IIRC, I had the same thing happening from April - May last year and then it just stopped.
| 8:12 pm on Jan 16, 2013 (gmt 0)|
theTrasher - Looking up the UA I've quoted leads via user-agents dot org to wiki.apache dot org which defines the initial proposal and adds a link to synapse.apache dot org which gives the definition in my OP.
It is quite possible that the Pascal/etc library is a wrapper around the Apache code or a fork of it - this would make sense. Both versions support both linux and windows ON APACHE. If it's the same then the feature "How to connect using GET to get page or send data to a server" is very disturbing!
Synapse (both) seems to use Java, which is currently an exploitable app that people are being recommended to disable. Hmmm.
There is an associated User-Agent of "Axis2" in the original synapse.apache documentation but this seems to be for the SOAP client.
bigtoga - I have not seen that scenario. It is quite feasible for different operators to run different operations, though. Do you allow the UA to read your pages or do you return (eg) 403?
| 8:19 pm on Jan 16, 2013 (gmt 0)|
"bigtoga - I have not seen that scenario. It is quite feasible for different operators to run different operations, though. Do you allow the UA to read your pages or do you return (eg) 403? "
I redirect them to a different place on my sites...
| 7:32 pm on Jan 17, 2013 (gmt 0)|
General feeling hereabouts is that "bad" hits are thrown a 403 to send them away again. Sending them to another page without a "go away" code only encourages them. :)
| 5:24 pm on Mar 3, 2013 (gmt 0)|
Further to the synapse UA:
Half way through the third day of the month. I have, for the previous two and a half days, had synapse be responsible for 51 out of 135 new IP bans. This is a very high proportion.
Of IPs already banned, either by IP (possibly but not necessarily through one of the above hits) or by range, there have been a further 105 synapse hits.
Most of these, as noted above, were from broadband ranges, although a few were from servers and some were indeterminate (possibly static IP ranges).
A high percentage of the hits were to one (genealogy) site, aiming for files ending in .ged rather than real pages. Since I've seen references that say this is an XML tool this is feasibly a legitimate access (though mistaken - these are genealogy gedcoms). Against legitimacy is the fact that few of the gedcoms (for example) would be of interest to residents of most of the source countries - these are of most interest to UK/AU/CA/US residents, not Eastern Europe and Asia). I suspect the high number of .ged hits were due to file-extension "scraping" from SEs - I can think of no other way of getting such accurate hits, although I may be missing something. If this is true then synapse is certainly being used for scraping attempts against a perceived useful (though actually useless) file.
The few hits not to gedcoms were to health and local history sites, to .ASP pages generally with querystring definers (eg health?pid=medname). I can't see why this should be regarded as a possible XML file but a thought occurs that if an ordinary bot were rejected then synapse may be employed instead.
Yet another possibility is that the tool is fairly legitimate and part of either some web browser or browser plug-in, and as such the user is not really aware of it.
However it works out, synapse is causing me more work in determining whether each rejected UA should be permanently blocked (server farm) or whether to give it benefit of doubt as a legit broadband range. Not a LOT of work, granted, but enough to be annoying.
| 3:10 pm on Mar 12, 2013 (gmt 0)|
I've had hits from over 40 IP addresses in the past 30 days all using this same User Agent.
Looking at the Apache project, it doesn't really make sense that it would be hitting the site so I started down the same path as thetrasher, looking for source code. Indeed, the Apache project DOES NOT use that user agent string while the Ararat Synapse library does.
Since it is a library I suspect many projects use it and are effectively hiding behind it. Indeed, this search on Ohloh shows exactly that: [goo.gl...]
At this point I don't think it will be effective to block the IP Addresses. The hits in my logs all come in blocks ranging from 6-30 hits and then don't appear to come back. (They might though, I'm only analyzing the past 30 days.) I am considering issuing a 403 for that user agent.
| 4:33 pm on Mar 12, 2013 (gmt 0)|
|I am considering issuing a 403 for that user agent. |
As well you should.
In addition you should also deny the following, which has long been used as a UA for providers that cache your files.
("Begins with" and "ends with")
any other slight variation of same.
There's not much difference between these types of UA's and blank UA's.
| 8:40 pm on Mar 12, 2013 (gmt 0)|
Like Hellfire, I have been debating whether to not block IPs for synapse but to just issue a 403. I have seen a few multiple-access cases and a handfull of IPs appear to be servers, though I would not swear to that. I think it probable that I will stop blocking IPs for synapse in the near future.
As to whether or not it's the apache project: I don't know, but as noted above the UA does seem to concentrate (though not exclusively) on .GED files which it may think are XML compatible. If it is not a genuine tool but a bad bot then it may not even be annotated in normal online sources.
Arts & Letters (Graphics)
Family Historian (GEDCOM File) (which mine are)
Game Editor interactive multimedia tool (likely target, I would think)
GoldED / DOS Compiled Configuration File (another likely target?)
Micrografx Simply 3D Geometry
| 9:35 pm on Mar 12, 2013 (gmt 0)|
While I was writing that last message my server had two more synapse hits a few seconds apart to exactly the same product and page of one web site. The hits were from NL and GB dynamic IP ranges.
This type of behaviour strongly suggests either a botnet or a proxy net. I probed both IPs for open ports - this, for botnet or proxy, normally shows positive. In these two cases (and in several previous attempts) there were no open ports. It is possible that both users closed down their machines in the intervening 45 or so minutes but this is unexpected if so.
The appearance is still of a botnet but one that has no open ports - at least, not on the common range. Which suggests it is driven via port 80 or 21/22 (web browser and FTP fetch), more likely the former. Time to log the headers in detail.
| 10:20 pm on Mar 13, 2013 (gmt 0)|
I collected 46 sets of headers from synapse UA accesses during the past 24 hours (from 12/Mar/2013 23:50:04 to 13/Mar/2013 21:46:40 GMT).
There were a few instances of multiple IP accesses, from 1 to 8 (I may have miscounted in the list below)...
193.203.48.nn (4 hits) - UA - server - open ports
89.73.233.nnn (1 hit) - PL - dynamic - stealth
85.97.73.nn (1 hit) - TR - dynamic - no open ports
91.217.90.nnn (3 hits) - UA - server - open ports
114.48.35.nnn (1 hit) - JP - mobile access - no open ports
219.85.0.nn (1 hit) - TW - dynamic - no open ports
62.218.160.nnn (1 hit) - AT - dynamic - no open ports
83.26.141.nnn (1 hit) - PL - dynamic - no open ports
89.70.113.nnn (2 hits) - PL - dynamic - no open ports
85.176.24.nnn (1 hit) - DE - dynamic - stealth
96.20.61.nn (6 hits) - CA - dynamic - stealth
109.236.84.nnn (2 hits) - NL - server - open ports
85.122.54.nn (1 hit) - RO - dynamic - no open ports
84.0.39.nn (1 hit) - HU - dynamic - no open ports
119.242.193.nnn (1 hit) - JP - dynamic - stealth
94.112.29.nnn (1 hit) - CZ - dynamic - no open ports
89.69.81.nn (1 hit) - PL - dynamic - stealth
87.58.114.nnn (1 hit) - DK - dynamic - stealth
217.132.64.nnn (1 hit) - IL - dynamic - no open ports
95.77.126.nnn (8 hits) - RO - dynamic - no open ports
176.40.150.nn (1 hit) - TR - dynamic - no open ports
79.163.157.nnn (1 hit) - PL - dynamic - no open ports
77.78.39.nn (1 hit) - BG - dynamic - no open ports
83.85.150.nnn (1 hit) - NL - dynamic - no open ports
89.157.214.nnn (2 hits) - FR - dynamic - stealth
83.60.82.nnn (1 hit) - ES - dynamic - open ports
"No open ports" could simply mean the computer was turned off during my port-checking access attempt.
Stealth ports - nominally closed (probably behind a firewall) but one or two may be open or closed-but-visible for specific purposes (eg FTP).
Servers are expected to have open ports, dynamic lines should have no open ports (with possibly a few "static" exceptions used for "office" connection or if the machine has a virus).
Multiple hits were usually but not always consecutive.
With one exception (the last IP) the combination of IP type and port mode is expected. With no obvious open ports on broadband lines it seems as if synapse is part of a normal tool, albeit an odd one. However...
1. All accesses were to pages that included querystrings (eg www.example.com/page.asp?pid=product). I have so far not noticed any access by synapse to a simple page URL.
2. I do not know what would happen if the site contained only https URLs. I do not have that kind of site.
3. SERVER_PROTOCOL: HTTP/1.0 is always the case, so not a proper browser, which would be HTTP/1.1.
4. All logged accesses included the headers...
HTTP_ACCEPT_CHARSET: iso-8859-1, utf-8, utf-16, *;q=0.1
5. The ACCEPT inclusion of xml suggests an earlier supposition may be correct or, possibly, an inclusive gathering mode to grab anything. One new hypothesis is: this is an RSS feed agregator or scraper, based on the inclusion of xml; if so why only querystring pages accessed (point 1)?
6. Although the identity specifier is new to me it is valid (incicates "do not encode").
7. Only 15 accesses included HTTP_COOKIE (my server sets temporary cookies only). Of these cookies, some were repeated but on different IPs, which indicates a shared user agent environment (eg a botnet or an advertising tracer). A typical cookie seems to be:
HTTP_COOKIE: ASPSESSIONIDAAAQACQB=BGNCCFADJJGMCHDCMKKGEEJ; ASP.NET_SessionId=-1%27; NID=; SID=; CID=;
(remember this is an IIS/ASP server).
If anyone can throw light on this, please do!
| 8:59 am on Mar 15, 2013 (gmt 0)|
In the last 14 weeks, since setting an alert to this UA, I haven't seen it at the sites I manage. Not sure if I've ever seen it actually. Seems they have you on their list. Sorry I can't offer any helpful info.
| 10:46 am on Mar 15, 2013 (gmt 0)|
|HTTP/1.0 is always the case, so not a proper browser, which would be HTTP/1.1 |
I hope you are not representing that as 100% beyond question == all the time. 95-99% depending on circumstances. Quick detour to raw logs with search for sequence HTTP/1.0" 200 turned up back-to-back humans from Spain and South Africa, of all places. (Surprised me too. I associate 1.0 with satellite internet in the extreme north of Canada.) Proportions are higher if I search in certain pre-screened areas.
Are they applying real parameters or making things up?
| 8:44 pm on Mar 15, 2013 (gmt 0)|
As far as I am aware the only "browsers" that run HTTP/1.0 are things like Lynx, which are not GUI browsers and hence very seldom used except for site inspection and scraping. Are you certain these are humans? Or could they be robots dressed as humans - ie androids. :)
If you track back on the IP for those browsers do they show open ports or are they completely closed? Are there any header anomalies? Are the browsers up-to-date or pre-19th century?
I'm not saying you are wrong, just that I haven't seen such things so question their existence. I'm quite prepared to be proven wrong. :)
A quick detaour to find the differences between 1.0 and 1.1...
From a 1996 Apache description on apacheweek.com:
"HTTP/1.1 contains a lot of new facilities, the main ones are: hostname identification, content negotiation, persistent connections, chunked transfers, byte ranges and support for proxies and caches."
In the period since posting the above analysis I notice some variations.
My note that all accesses were to pages with querystrings must be modified: some hits are now occurring to non-querystring URLs - not many but a few, pages such as publicity.asp, join.asp and index.asp (I could accept the first two as hack-points but the home page is a bit strange).
I'm seeing a few querystring instances of the form i=-1%27 where i may be other letters or even "words". As far as I know I do not have any querystring arguments -1%27 so am a bit of a loss on that one (%27 is a single-quote) (all querystrings are ASCII characters only). Sometimes that is the whole qs and sometimes it appears in the middle of other parameters. This is true of some of the early ones reported above and of the latter ones since then.
It is possible that I'm seeing querystring accesses only because there are querystrings, although many pages of each site do not use them.
| 11:59 pm on Mar 15, 2013 (gmt 0)|
|Are the browsers up-to-date or pre-19th century? |
I really want to see a 1798-vintage browser :) Funny you should mention Lynx, because I did find a few specimens of myself checking things. And of course there were plenty of robots, even when I constrained the search to "200" meaning robots I don't already know.
Unfortunately I've only recently added header checking to all pages, so I can't generally go by those. But the current month coughs up three unrelated occurrences of page request coming in from search-- with appropriate search string from google dot country-matching-IP --followed by all subsidiary files including piwik and favicon
Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.97 Safari/537.22
Mozilla/5.0 (Windows NT 6.1; rv:19.0) Gecko/20100101 Firefox/19.0
Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:18.0) Gecko/20100101 Firefox/18.0
There's also one iPad with refererless request for a particular image, repeated 20+ times at intervals of 0-3 seconds. (If you didn't know, you would assume this is a robot. But I rewrite this category of requests to an administrative gif; they're from the Google app. So the re-loads are what you would get if a human really, really wanted to see the picture.)
:: detour to check headers ::
Oh, now that's interesting. All three of the most recent ones came in with X-Forwarded-For headers-- one belonging to the same country and the other two from Private Registration ranges-- and a "Via" line in the form
1\.1 \S+ \(squid/\d\.\d\.STABLE\d+\)
My main 1.0 association is with URLs in neighborhoods like 64.26. or 66.185., though further investigation shows that they really vary all over the map. For a given definition of "map", sure, but not absolutely restricted to the wilds of northern Manitoba as I'd thought.
|If you track back on the IP for those browsers do they show open ports or are they completely closed? |
The foregoing was so much Hungarian to me. Sorry ;)
| 9:39 pm on Mar 16, 2013 (gmt 0)|
The three UAs are valid but it depends on the context - they may still cover bot activity. iPads are a nuisance - I block a few due to faulty headers: one day I'm going to have to work through the rejects and see what's wrong. :(
Squid is a common linux proxy. I usually find that either the Forwarded-For is the same as the IP (valid); or it is a local IP (eg 10.n.n.n - often but not always valid); or it is a server farm trying to get past a block (NOT valid); or it's something like G or Y trying to forward for a so-called customer (sometimes valid but they often get blocked due to a previoous customer not being valid).
22.214.171.124/18 is Hostway - see comment above re: server farm proxies. Ditto I have two /19s blocked as servers in the 66.185/16 range. I would guess a lot of those you are seeing are from server ranges.
Do you run a Linux machine? If so try Umit. It allows you to send out requests for port information on remote computers - slightly unethical except when THEY hit you first! :) There used to be a Windows-resident Sam Spade but last time I used it, many years ago, it had begun failing IP resolution: too old and not updated. There are probably other windows tools.
Thought: if you run windows desktop but a linux web server you may be able to install Umit or run a similar service via a PHP library.
| 10:26 pm on Mar 16, 2013 (gmt 0)|
But 126.96.36.199/18 isn't ;) And, oops, 66.185 was a typo for 66.165. To be exact 188.8.131.52/19 They're combined ISP, servers and colo. Often a danger sign, but in my case an automatic-pass range. (Keewaytinook Okimakanak, Chiefs Council, somewhere at the other end of Ontario. Didn't have to look it up, either.)
I hardly ever check headers but it was good to discover that Google Preview --and other auxilies like Translate and Wireless Transcoder-- includes the X-Forwarded-For header. So I can see who's behind the 74.125 or whatever it is.
| 9:01 pm on Mar 17, 2013 (gmt 0)|
My point was: some of the ranges ARE servers and WOULD present fake credentials including the X-Forwarded-For. I break the ranges down according to content, not whether some sub-ranges are dynamic. There are a lot of sub-ranges that are killers, whether dynamic or server.
Google Preview - and a lot more of google - is an automatic block here. I don't believe it makes a difference and the primary IP is in any case only blocked if it fails on other criteria.
| 12:32 am on Mar 20, 2013 (gmt 0)|
184.108.40.206 - - [19/Mar/2013:04:10:59 -0700] "GET /ebooks/paston/paston2.html HTTP/1.0" 200 945169 "-" "Mozilla/4.0 (compatible; Synapse)"
With that IP, "Ukrainian robot" was a safe guess and was in fact correct.
A just-to-be-sure followup search turns up the occasional "Synapse" in logs. Some are red herrings in the form "SynapseWorkstation.3.2.1" as part of the MSIE bigger-is-better UA string, but there was a sprinkling of basic Synapse from ranges that looked Ukrainian though it wasn't worth checking them out. Especially if they came asking for things like "wp-admin" that are automatic 403s anyway.
SynapseWorkstation, yuk, that means I have to express the UA block as Synapse\) to avoid false positives.
| 1:20 am on Mar 20, 2013 (gmt 0)|
Personally, I wouldn't allow SynapseWorkstation or it's variations either, no matter where they come from. Installation only seems needed to access certain libraries on school or company servers. Odds seem pretty slim that any significant amount of normal users would have this agent installed..
|The Synapse Workstation Client is specific to Windows Internet Explorer ONLY. It will not function with any other browser. |
Windows 7 (64 bit) is NOT currently supported and the Synapse workstation code will NOT install on a 64 bit machine.
MAC OS systems are NOT supported.
| 9:31 pm on Mar 20, 2013 (gmt 0)|
Lucy - how were you able to determine the access was a Ukrainian robot? Apart from the IP being in a long-blocked MHost server range, that is.
I get synapse from a wide range of countries and the accesses look very much like a low-aggression botnet, which means the "owners" could be anywhere in the world.
| 4:50 pm on Apr 12, 2013 (gmt 0)|
Data for your consideration:
During this past week, in a time window of 48 seconds, on a single IIS/ASP web site normally only very sparsely trafficked:
1. 55 GETs of which
- 6 were for a never existent file/directory
- 1 was for a landing page
- 9 were to a page link followed from the landing page
- 11 were to a Transfer'red to error screening page
- 29 were for Redirect'ed to error reporting page
2. Consistent use of HTTP/1.0
3. 2 Different User-Agent strings used
4. 11 Different IP addresses (rDNS looked-up ISO country code and Organization names follow):
- IR - Iran Cell Service and Communication Company
- SA - Etihad Etisalat Company (Mobily)
- RU - CJSC Comstar-Regions; Dynamic Service
- PL - SMSNET Sosnowiec
- TR - Turk Telekom
- HR - T-Mobile Croatia
- LU - root SA
- CL - Telmex Chile S.A HFC
- VE - CANTV Servicios
- SV - CTE S.A. de C.V.
- CO - Colombia Telecomunicaciones S.A. Esp
5. In 2 similar sets of log entries from October 2012, the User-Agents and GET requests were nearly identical, excepting the source IP address. Previously, all GETs were issued using a single IP address (rDNS looked-up as (country code, organization): LU - root SA)
The rotations/combinations of the above observations in the sequence of individual GETs strongly implies a single actor and deliberate effort to a) masquerade, and b) not render the downloaded HTML as in a browser.
Your mileage may vary.