Forum Moderators: Robert Charlton & goodroi
System Requirements
Operating System: Win XP or Win 2000 SP3+
Browser: IE 5.5+ or Firefox 1.0+
Availability: For users in North America and Europe (during beta testing phase)
Press Release:
Google Web
Accelerator significantly reduces the time that it takes broadband users to
download and view web pages. The Google Web Accelerator appears as a small
speedometer in the browser chrome with a cumulative "Time saved" indicator.Here's how it works. A user downloads and installs the client and begins
browsing the web as she normally would. In the background, the Google Web
Accelerator employs a number of techniques to speed up the delivery of
content to users.
Looks like some of the Mozilla hires are paying dvidends.
This line (below) i personally use exactly "as is":
-------------------------------
# google proxy: 72.14.192.0 - 72.14.207.255
RewriteCond %{REMOTE_ADDR} ^72\.14\.(19[2-9]¦20[1-7]) [OR]
------------------------------- However, it seems that the Web Accelerator has sofar only been seen from the lower IP numbers, ie. 72.14.192.---. So, if you want to block less than the full range, you can just use this line in stead:
-------------------------------
RewriteCond %{REMOTE_ADDR} ^72\.14\.192 [OR]
-------------------------------
- We do not change the user agent.
- We do add an "X-moz: prefetch" header to all prefetch requests. That way, webmasters can choose to just ignore prefetch requests if they so choose.
- We only prefetch URLs which according to the HTTP 1.1 spec should not have any side-effects. These are basically GET requests without a "?".
- We also include an X-Forwarded-For header which provides the original IP address in case webmasters want to perform granular geotargeting.
I think using X-Forwarded-For is the usual way that proxies like squid pass on a user's original IP address? So the accelerator is like most caching proxies in that sense. I'll be heading to bed in a little bit, but hopefully I'll have more time tomorrow to post. But I'm happy to relay questions if people want to find out more about how it works.
We only prefetch URLs which according to the HTTP 1.1 spec should not have any side-effects. These are basically GET requests without a "?".
I have to say this sounds a bit errenous. The querystring is not the only way to pass data. Cookies can be used to pass data as well.
If Google are hell-bent on introducing this without allowing webmasters any control over it, then, in order to prevent side-effects, the application should not fetch URLs when a cookie has been set.
I really don't understand why Google are so adamant that webmasters should have no control over the app. The prefetching mechanism is, in effect, a semi-automated web crawler and therefore it should obey robots.txt
Well, at least we know how to block it now. We have to append a "?" to each of the URLs we don't want to be prefetched.
GoogleGuy, do you know if doing this will have adverse side-effects with googlebot?
- We do add an "X-moz: prefetch" header to all prefetch requests. That way, webmasters can choose to just ignore prefetch requests if they so choose.
What's the best server response to a request containing the X-moz: prefetch header if you choose to "just ignore" it?
I just checked the mod_rewrite docs and couldn't see a rule flag that means "drop silently"; so you have to return _something_. What's the best _something_ to return in order to avoid additional side effects?
The particular side-effect that i'm worrying about is the WA assuming that my attempt to ignore the request is the actual response and return that to the end user....
Justin
Added: Of course this may *not* help for outbound links.
Causes
1. My DSL is too slow for this to work and it actually has a negative impact. When i signed up for dsl it was very fast. I was near the source. Then I moved to a location further away from the source but still had time on the contract so I couldn't switch to cable, which is much faster at my location. I am probably about 50% faster than a 56k.
2. Since the main links on the page are to other image laden pages, the browser is fetching other images while trying to download and render the ones on the current page. However, I am noticing a general 'slowerness' on other pages I visit, but there are no easy measures of speed like the buttons in my galleries.
That's strange, Powdork. It goes slower when you try it out?
That's hardly strange.
If the web server is closer than Google's proxy in terms of network topology and if the output of the server uses HTTP compression then Google's response will likely be slower than the original web server's response would have been.
Google's proxy isn't like an ISP's proxy that is guaranteed to be "closer". If it's farther away and it can't compress the data considerably then Google's response will be slower.
I know these Google programmers think they're smart, but if they think they can break the laws of physics then they really need to cut down on the amount of caffeine they're consuming! ;-)
You should be able to return anything you like
... but then how does the WA know that that is not the expected response and serve it up in response to the "real" click a few moments later?
I must be missing something here. Nothing I send out in the response, that can be interpreted by a computer program, can imply that the page is any different because of the prefetch header included in the request.
It would need a response header that implies "Prefetch-Denied", otherwise you have to figure out how to drop the connection and break the protocol...
403 Access Denied
....and then that gets returned to the user in response to their subsequent genuine click!
Surely you're just being hopeful that the developer of an intermediate proxy that is inserting prefetch headers (the WA in this case) is going to asert in their design that if they receive a 403 Access Denied then they should re-request the URL in the future when the genuine click through comes through.
I am not happy making that assumption; because I did not design the WA, and I don't know the person who did.
Regardless, the only thing that could be safely relied upon not to break functionality in response to a request containing a prefetch header is something that specifically references that prefetch request, such as a new response code or prefetch-denied header.
What you see in a browser:
www.yoursite.com
What your server sees in *every* request that is made:
GET [yoursite.com...] HTTP/1.1 (or some other method EG POST, HEAD, etc. and version of HTTP/1.0)
So, since G is sending the 'prefetch' header request only to requests it is prefetching and *not* to links that are being clicked by the user. You can effectively block the prefetch request at the server level with this or something similar:
RewriteEngine On
SetEnvIf X-moz prefetch HAS_X-moz
RewriteCond %{ENV:HAS_X-moz} prefetch
RewriteRule . [F,L]
This will *only* impact a request that contains X-moz: prefetch. It will *not* impact a simple GET or POST request that does not contain the prefetch request in the header. EG A user clicking on a link.
Hope this helps.
Justin
Added: >> and then that gets returned to the user in response to their subsequent genuine click!---This is incorrect.
Edited for clarity.
I have checked and rechecked my analog stats for yesterday, I must admit that I do not see any prefetch requests from the Google Accellerator IP anywhere.
Perhaps Google is going to implement that option later (since the WA is only in BETA mode). Currently, only the IP tells me when the Google WA is at work.
It is interesting to see that WA also prefetches files within .htpasswd protected directories. WA prefetches anything your browser (IE or FF) has access to.
Another interesting pattern: WA downloads all files requested by a visitor, which means double bandwidth usage. It is difficult to see whether the downloaded prefetched files are kept within the WA buffer cache on the visitors PC or kept on a Google server - or both.
Anyways, I'm done fiddling with AW and have decided to delete it from my PC.
What I find confusing is the behaviour after I blocked AW in the server .htaccess file, still using the WA for browsing. Some of the pages turns up anyway, even though I never clicked them using the WA. Whether the page comes from the ordinary browser cache or the Google server prefetch cache is impossible for me to decide.
Some clicks did return a 403 error page, but the URI remained intact in the address bar. When refreshed the page turned up. The only .htaccess code that definitely shut down the WA prefetches was this:
<Limit GET POST>
order allow,deny
allow from all
deny from 72.14.192.6
</Limit>
This will *only* impact a request that contains X-moz: prefetch. It will *not* impact a simple GET or POST request that does not contain the prefetch request in the header. EG A user clicking on a link.
This statement doesn't seem to appreciate what an accelerating proxy server actually does. The whole point of the prefetch is that the users "real" click never makes it to your server.
Anyway I agree that we shouldn't bog this thread down with the technie details so i've tried to clarify my concerns in a separate thread...
[webmasterworld.com...]
My understanding is: The prefetching is caching a 'live' page from your server on-the-fly (EG loading the page into a cache at G) - If you deny the request for the prefetch, the page will not be cached. If the page is not cached all link(s) will continue to have their normal function (IOW the links will not be broken, nor will the pages cease to function, because you denied the precaching... Just like any other link on the page that is not cached.) So, by denying the proxy request for the pre-caching, you are effectively turning off the pre-caching engine, and forcing the requests to be processed as normal by your server.
Justin
BTW I believe since the 'prefetch' header sent is that of a proxy function it will not show in your logs as a normal GET/POST header would... It should only show that the page was loaded by the G proxy IP address.
Anyway hope this help someone...
With WA, and the abiding faith that any WA user would necessarily have in Google as God, the mess-ups from Google will get blamed on us webmasters.
True, and Google cares not.
from mrMister...
I really don't understand why Google are so adamant that webmasters should have no control over the app.
I remember a few years ago, when content building webmasters would sing in unison about their partnership with Google. It has to be getting harder and harder to pretend.
(and that's not directed at anyone personally... I just jumped off your quote.)
I asked before, but you missed it...
I'm on a shared server so will it stop prefetching of my account. (I don't control the server htaccess).
Bill asked:
"Why block WA?
Redirect it to a page the describes why it's bad technology burning bandwidth needlessly and should be uninstalled - educate those masses in their hypnotic trance that drool and chant "Goooooogle" all day on the net."
Okay Bill, please post the relevant compact htaccess code to redirect safely, together with your sample 'Google WA tutorial for the duped', and we will use it, until Google give us back our liberty.
<Okay Bill, please post the relevant compact htaccess code to redirect safely, together with your sample 'Google WA tutorial for the duped', and we will use it, until Google give us back our liberty.>
Liberty isn´t something somebody gives you, its something you fight for and earn! Also true in Google case. Its publishers who should fight back for their liberty.
History repeat it selv. The present situation with Google´s WA remindes me of a situation in 2001 where the great legend, Jim Wilson gathered hundreds of decent webmasters and lead the fight against scumware.
[scumware.com...]
Who knows who will be the next Jim to lead us all as publishers in our fight for privacy and control of our own contents.