Forum Moderators: phranque
Even though the user agent strings look like normal web browsers of all flavors, it appears that they have an abnormal propensity to screw up web addresses compiled by javascripts. Quite literally almost all of the 404 errors caused by their users are the result of JavaScript URLs not being put together and requested properly.
It absolutely amazes me that the same guys who put satellites into space can't get their systems to parse JavaScripts properly. Maybe the problem is that JavaScript isn't rocket science.
The problems all have a common cause: The fact that satellite providers proxy not only the client connection, but the client itself. This is in an effort to compensate for the 'long fat pipe' -- or more accurately, the very very long, and only-somewhat fat pipe.
Since geosynchronous satellites orbit high (22,300 miles) above the earth, signals to/from a satellite have a significant travel time. Let's say you're a satellite user, and you just ping an IP address that is actually located at the ISP's satellite-receiving facility. Your ping request has to travel up to the satellite, then down to the earth station, and then the ping response has to travel back up to the satellite, and then back down to your satellite antenna. The total "air-time" of this signal is approximately 480 milliseconds, or almost one-half second (derived from the total 89,200 miles divided by the 186,000 MPH speed of light). For comparison, I can ping almost any domestic server in less that one-twelfth of that time...
To compensate for this, and to limit the number of client requests that travel through the satellite itself, satellite ISPs actually proxy the client's functions; When the client requests a Web page, the ISP captures and stores it in a server, analyzes it, and looks for all of the objects that it includes, scanning the HTML for <img> and <link rel="xyz"> tags, etc. Then they issue requests 'to the Web' for all of those objects, collect the responses, bundle up the page and all of its included objects, and send the whole mess all at once back to the client on the other end of the satellite link.
On the client end, they have a little proxy host running. This host accepts the 'bundle' of page-plus-objects from the ISP's satellite proxy-client, and passes the originally-requested HTML page on to the 'real' client -- the user's browser. Then as soon as that browser parses the HTML and starts requesting images, stylesheets, and all the other included objects, the local proxy host simply hands back all the objects that the proxy client has already prefetched.
A bit of thinking about this will reveal some of the reasons that this kind of system has trouble with immediately following redirects and with properly-handling client-side scripting -- I'd imagine that sites using JS-heavy AJAX with lots of 'events' are probably really difficult to handle in such a system...
Anyway, had satellite internet service for several years before any alternatives became available in my area, and in debugging various problems (both for myself and for other Webmasters) I learned a bit about how these systems work (and why), so I hope that's useful.
I think it probably *is* easier to design and build the satellite and to launch it and control it, than to try to reliably emulate every possible kind of HTTP and client-side scripting function using a reasonably-small client-side and ISP-side software package...
Jim
function Murl(c){
var a='http://example.com';
var b='/foobar/'
return '<img src="'+a+b+c+'">';
}
document.write(Murl('some.gif'));
Like I said, even spammers have figured out how to do rudimentary parsing of JavaScript to rebuild proper URLs. It would certainly improve their user experience.
Heck I bet they could license something that would do this off of Google or someone.
If they wouldn't try to parse the JavaScript files at all or they parsed them correctly the 404 issue would go away. If they are going to try and read URLs from JS files then they need to pseudo execute the JS file enough to actually know what URLs need to pulled and which shouldn't be pulled unless executed by the end user. As it is, they are wasting a lot of their and our server resources on stupid stuff.
Since nearly all images are served through file.php anyone using Hughes will cause numerous requests for files that do not exist. Instead of requesting file.php?id=#*$!X I get requests for image.jpg
What needs to be understood is that the latency on a satellite connection is already tremendous (e.g. >400ms) due to the distances signals get sent. What Hughes is attempting to do is improve response times by prefetching objects like images, javascripts, stylesheets, etc. while the parent HTML file is being transmitted to the user. By doing this they can eliminate the additional latency of requesting and transmitting the supporting objects between the actual web server and Hughes' network. The end result is faster loading pages and a better user experience.
One way we can really help our users on ISPs like Hughes or in remote corners of the world is to make sure we are making proper use of the expires and cache-control headers. This will allow the users' browsers and their ISPs caching servers can properly cache and pre-fetch supporting objects.
The faster our pages render in the end user's browser, the better their experience will be on our websites so it is really in our best interest to take advantage of the efforts ISPs like Hughes put into prefetching and caching objects.