Forum Moderators: Robert Charlton & goodroi
System Requirements
Operating System: Win XP or Win 2000 SP3+
Browser: IE 5.5+ or Firefox 1.0+
Availability: For users in North America and Europe (during beta testing phase)
Press Release:
Google Web
Accelerator significantly reduces the time that it takes broadband users to
download and view web pages. The Google Web Accelerator appears as a small
speedometer in the browser chrome with a cumulative "Time saved" indicator.Here's how it works. A user downloads and installs the client and begins
browsing the web as she normally would. In the background, the Google Web
Accelerator employs a number of techniques to speed up the delivery of
content to users.
Looks like some of the Mozilla hires are paying dvidends.
- We do add an "X-moz: prefetch" header to all prefetch requests. That way, webmasters can choose to just ignore prefetch requests if they so choose.
What's the best server response to a request containing the X-moz: prefetch header if you choose to "just ignore" it?
I just checked the mod_rewrite docs and couldn't see a rule flag that means "drop silently"; so you have to return _something_. What's the best _something_ to return in order to avoid additional side effects?
The particular side-effect that i'm worrying about is the WA assuming that my attempt to ignore the request is the actual response and return that to the end user....
Justin
Added: Of course this may *not* help for outbound links.
Causes
1. My DSL is too slow for this to work and it actually has a negative impact. When i signed up for dsl it was very fast. I was near the source. Then I moved to a location further away from the source but still had time on the contract so I couldn't switch to cable, which is much faster at my location. I am probably about 50% faster than a 56k.
2. Since the main links on the page are to other image laden pages, the browser is fetching other images while trying to download and render the ones on the current page. However, I am noticing a general 'slowerness' on other pages I visit, but there are no easy measures of speed like the buttons in my galleries.
That's strange, Powdork. It goes slower when you try it out?
That's hardly strange.
If the web server is closer than Google's proxy in terms of network topology and if the output of the server uses HTTP compression then Google's response will likely be slower than the original web server's response would have been.
Google's proxy isn't like an ISP's proxy that is guaranteed to be "closer". If it's farther away and it can't compress the data considerably then Google's response will be slower.
I know these Google programmers think they're smart, but if they think they can break the laws of physics then they really need to cut down on the amount of caffeine they're consuming! ;-)
You should be able to return anything you like
... but then how does the WA know that that is not the expected response and serve it up in response to the "real" click a few moments later?
I must be missing something here. Nothing I send out in the response, that can be interpreted by a computer program, can imply that the page is any different because of the prefetch header included in the request.
It would need a response header that implies "Prefetch-Denied", otherwise you have to figure out how to drop the connection and break the protocol...
403 Access Denied
....and then that gets returned to the user in response to their subsequent genuine click!
Surely you're just being hopeful that the developer of an intermediate proxy that is inserting prefetch headers (the WA in this case) is going to asert in their design that if they receive a 403 Access Denied then they should re-request the URL in the future when the genuine click through comes through.
I am not happy making that assumption; because I did not design the WA, and I don't know the person who did.
Regardless, the only thing that could be safely relied upon not to break functionality in response to a request containing a prefetch header is something that specifically references that prefetch request, such as a new response code or prefetch-denied header.
What you see in a browser:
www.yoursite.com
What your server sees in *every* request that is made:
GET [yoursite.com...] HTTP/1.1 (or some other method EG POST, HEAD, etc. and version of HTTP/1.0)
So, since G is sending the 'prefetch' header request only to requests it is prefetching and *not* to links that are being clicked by the user. You can effectively block the prefetch request at the server level with this or something similar:
RewriteEngine On
SetEnvIf X-moz prefetch HAS_X-moz
RewriteCond %{ENV:HAS_X-moz} prefetch
RewriteRule . [F,L]
This will *only* impact a request that contains X-moz: prefetch. It will *not* impact a simple GET or POST request that does not contain the prefetch request in the header. EG A user clicking on a link.
Hope this helps.
Justin
Added: >> and then that gets returned to the user in response to their subsequent genuine click!---This is incorrect.
Edited for clarity.
I have checked and rechecked my analog stats for yesterday, I must admit that I do not see any prefetch requests from the Google Accellerator IP anywhere.
Perhaps Google is going to implement that option later (since the WA is only in BETA mode). Currently, only the IP tells me when the Google WA is at work.
It is interesting to see that WA also prefetches files within .htpasswd protected directories. WA prefetches anything your browser (IE or FF) has access to.
Another interesting pattern: WA downloads all files requested by a visitor, which means double bandwidth usage. It is difficult to see whether the downloaded prefetched files are kept within the WA buffer cache on the visitors PC or kept on a Google server - or both.
Anyways, I'm done fiddling with AW and have decided to delete it from my PC.
What I find confusing is the behaviour after I blocked AW in the server .htaccess file, still using the WA for browsing. Some of the pages turns up anyway, even though I never clicked them using the WA. Whether the page comes from the ordinary browser cache or the Google server prefetch cache is impossible for me to decide.
Some clicks did return a 403 error page, but the URI remained intact in the address bar. When refreshed the page turned up. The only .htaccess code that definitely shut down the WA prefetches was this:
<Limit GET POST>
order allow,deny
allow from all
deny from 72.14.192.6
</Limit>
This will *only* impact a request that contains X-moz: prefetch. It will *not* impact a simple GET or POST request that does not contain the prefetch request in the header. EG A user clicking on a link.
This statement doesn't seem to appreciate what an accelerating proxy server actually does. The whole point of the prefetch is that the users "real" click never makes it to your server.
Anyway I agree that we shouldn't bog this thread down with the technie details so i've tried to clarify my concerns in a separate thread...
[webmasterworld.com...]
My understanding is: The prefetching is caching a 'live' page from your server on-the-fly (EG loading the page into a cache at G) - If you deny the request for the prefetch, the page will not be cached. If the page is not cached all link(s) will continue to have their normal function (IOW the links will not be broken, nor will the pages cease to function, because you denied the precaching... Just like any other link on the page that is not cached.) So, by denying the proxy request for the pre-caching, you are effectively turning off the pre-caching engine, and forcing the requests to be processed as normal by your server.
Justin
BTW I believe since the 'prefetch' header sent is that of a proxy function it will not show in your logs as a normal GET/POST header would... It should only show that the page was loaded by the G proxy IP address.
Anyway hope this help someone...
With WA, and the abiding faith that any WA user would necessarily have in Google as God, the mess-ups from Google will get blamed on us webmasters.
True, and Google cares not.
from mrMister...
I really don't understand why Google are so adamant that webmasters should have no control over the app.
I remember a few years ago, when content building webmasters would sing in unison about their partnership with Google. It has to be getting harder and harder to pretend.
(and that's not directed at anyone personally... I just jumped off your quote.)
I asked before, but you missed it...
I'm on a shared server so will it stop prefetching of my account. (I don't control the server htaccess).
Bill asked:
"Why block WA?
Redirect it to a page the describes why it's bad technology burning bandwidth needlessly and should be uninstalled - educate those masses in their hypnotic trance that drool and chant "Goooooogle" all day on the net."
Okay Bill, please post the relevant compact htaccess code to redirect safely, together with your sample 'Google WA tutorial for the duped', and we will use it, until Google give us back our liberty.
<Okay Bill, please post the relevant compact htaccess code to redirect safely, together with your sample 'Google WA tutorial for the duped', and we will use it, until Google give us back our liberty.>
Liberty isn´t something somebody gives you, its something you fight for and earn! Also true in Google case. Its publishers who should fight back for their liberty.
History repeat it selv. The present situation with Google´s WA remindes me of a situation in 2001 where the great legend, Jim Wilson gathered hundreds of decent webmasters and lead the fight against scumware.
[scumware.com...]
Who knows who will be the next Jim to lead us all as publishers in our fight for privacy and control of our own contents.
Requests currently made to my servers using the accelerator are causing some problems I don't want.
Users are loosing their sessions and other users get content they are not supposed to see, caused by users seeing pages prefetched for other users.
So, in short, I want to avoid the accelerator to `accelerate` my pages.
What response should I sent to the Google Servers that will tell them not to accelerate that specific page and let the browser itself fetch the page directly from the server?
Any specific http-response-code that will allow this?
Any specific header that will prevent google to cache the page?
Will it listen to an entry in the robots.txt file?
I welcome usefull software that does what it is supposed to do, but when it fails to do that. It must be possible to do something about it...
Am i crazy, or do webmasters DO NOT want their sites accessed faster and thereby improving user experience?
Not to sound callous, but did you read this thread?
It literally breaks sites that rely on cookies for shopping, login and other info.
It also skews all results of your visitor logs/tracking system as you have no idea if the visitor actually viewed a page or hovered over your navigation which triggered the prefetch of that page.
What does this mean? If I go to a site, lets say dmoz.org and use my mouse as a pointer as I'm reading through 100 listings in a category – those sites pages will be prefetched showing that I visited their site when I really haven't. Same goes for an individual site. Say I go to your site and hover the navigation consisting of 15 areas of your site, I download those pages via prefetch triggering, although I never left your homepage. So as a site owner you think I went through your whole site since I "downloaded" those pages, but I really only looked at your homepage and left.
The webaccelerator does not speed up anything in "most" cases, it only appears that way since it is downloading other pages in the background. If it really worked, you could use it on a dial-up connection and notice a speed improvement. As it is it would probably kill your dial-up connection bringing it to a crawl.
It also provides a very fast proxy for site downloaders/rippers. I mentioned that and also saw it firsthand in a clients logs where over 1200 pages were downloaded with 0 seconds between pageviews. How was this done, simple, go download FireFox and the plugin for httrack. Jeez, you could set the UA string to that of Googlebot and when the unsuspecting site owner did a whois on the IP they would think it was Google crawling their site if they didn't know any better.
I'll ask your other questions as well, but I'd ask people to give it a little time before jumping to a conclusion that the accelerator is bad for your sites--it's been less than two days since the Labs demo was put up.
I'd ask people to give it a little time before jumping to a conclusion that the accelerator is bad for your sites-- it's been less than two days since the Labs demo was put up.
Hey, you know that I don't mean anything personal by my posts ;)
I think the problems that have been mentioned by others and myself should have been thought of and/or addressed long before the release IMHO. C'mon, you have some of the brightest minds in the business.... I give you more credit than this...
I'm wondering if Google will come up with a fix for Urchin so they will have the only stats program out there who's results are skewed by the Google webaccelerator :)
If you go back a year to Gmail's debut, there were also people who wanted to block Gmail right after it was introduced. Fast forward a year, and many many people like Gmail now that they've seen the direction that we've gone with it: 2 gigs of storage and growing, a solid UI, free POP access, and free email forwarding as well. The accelerator had been out for less than a day and a half before people were forming conclusions about whether it was helpful for their site or not. I'm saying give it a little time (hey, maybe the weekend :) before deciding. And in the meantime, I'm happy to find an engineer who worked on this and ask them questions about what people are asking here.
- Users losing session:
This is a very simple thing and I'm able to reproduce this.
User does something on page 1 that causes some preference to be stored in the session. This information is used on subsequent pages. At some point it is clear from the users point of view this preference is no longer stored in the session, but they don't know why.
In some cases reloading the page works and it turns out it was simply an old version of the page that was cached was being viewed. In other cases reloading doesn't help.
When I take a look at the sessions on the server at that moment; the user appears to be having 2 (or more!) sessions. ie: they `lost` their session and a new one was created.
Allthough it happens that users loose their session without using the google accelerator; using the accelerator does cause this to happen more often.
- some system information
sessions are passed along via cookies, if no cookie has been set (yet), a parameter in the url is used.
The url's don't seem to be dynamic; as they don't contain ANY extention (.php/.html or whatever) and there is NO? in it at all. Url's just look like a path to some folder on the server.
Problem is that allmost ALL my pages are dynamic to some extend, because of the use of preferences that can be set by visitors.
Using a proxy or a system like google's accelerator doesn't really work for a site constructed like this.
[edited by: engine at 3:34 pm (utc) on May 6, 2005]
[edit reason] See TOS [webmasterworld.com] [/edit]
You say in that analysis: "Imagine a million people downloading the Google Web Accelerator and all of a sudden, you have an infrastructure that finds out about a lot of pages very quickly."
Sorry to be obtuse, but what do you think the Google Toolbar does? Surely it doesn't collect all that information about every single page you go to just for fun.