Google Windows Web Accelerator

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google Windows Web Accelerator

Brett_Tabke

8:09 pm on May 4, 2005 (gmt 0)

[webaccelerator.google.com...]

System Requirements
Operating System: Win XP or Win 2000 SP3+
Browser: IE 5.5+ or Firefox 1.0+
Availability: For users in North America and Europe (during beta testing phase)

Press Release:

Google Web
Accelerator significantly reduces the time that it takes broadband users to
download and view web pages. The Google Web Accelerator appears as a small
speedometer in the browser chrome with a cumulative "Time saved" indicator.
Here's how it works. A user downloads and installs the client and begins
browsing the web as she normally would. In the background, the Google Web
Accelerator employs a number of techniques to speed up the delivery of
content to users.

Looks like some of the Mozilla hires are paying dvidends.

claus

12:25 am on May 6, 2005 (gmt 0)

Angonasec, nothing wrong with your code as far as i can see. It looks perfectly OK to me.

This line (below) i personally use exactly "as is":

-------------------------------
# google proxy: 72.14.192.0 - 72.14.207.255 
RewriteCond %{REMOTE_ADDR} ^72\.14\.(19[2-9]Ś20[1-7]) [OR] 
-------------------------------

However, it seems that the Web Accelerator has sofar only been seen from the lower IP numbers, ie. 72.14.192.---. So, if you want to block less than the full range, you can just use this line in stead:

-------------------------------
RewriteCond %{REMOTE_ADDR} ^72\.14\.192 [OR] 
-------------------------------

incrediBILL

12:42 am on May 6, 2005 (gmt 0)

OK, just a silly thought....

Why block WA?

Redirect it to a page the describes why it's bad technology burning bandwidth needlessly and should be uninstalled - educate those masses in their hypnotic trance that drool and chant "Goooooogle" all day on the net.

GaryK

12:45 am on May 6, 2005 (gmt 0)

Can anyone help out us Windows Server webmasters who use ISAPI_Rewrite? I can't seem to find a way to make it block a range of IP Addresses. Thanks.

christopher w

1:33 am on May 6, 2005 (gmt 0)

Google needs to hire an SEO - try searching for Google web accelerator ;)

In all seriousness - I have been using it since yesterday and so far have saved over 25 mins....

mrMister

2:00 am on May 6, 2005 (gmt 0)

In all seriousness - I have been using it since yesterday and so far have saved over 25 mins....

The figure that Google presents you is just a guesstimate on their behalf.

I've seen pages that load slower through the Google Accelerator, yet Google Accelerator still declared that it saved me time!

LeoXIV

2:49 am on May 6, 2005 (gmt 0)

The data gathered is so valuable that it is even worth for Google to pay everybody to use it. we should coin a new term beyond 'freeware'. how about Loonieware? [it pays the user $1 per month]

note: we call our $1 coin a Loonie in Canada [From the image of a loon on one side of the coin.]

Powdork

4:37 am on May 6, 2005 (gmt 0)

In all seriousness - I have been using it since yesterday and so far have saved over 25 mins....

Problem is, when I view my pages with it on it says I am saving time. Viewing it with it off the pages load much more quickly. These are image heavy pages.

GoogleGuy

6:31 am on May 6, 2005 (gmt 0)

Hey, I was in training today, so I apologize that I didn't stop by earlier. But I did get a chance to ask for some more info about the proxy cache. Here's what I heard back from someone more familiar with the accelerator:

- We do not change the user agent.
- We do add an "X-moz: prefetch" header to all prefetch requests. That way, webmasters can choose to just ignore prefetch requests if they so choose.
- We only prefetch URLs which according to the HTTP 1.1 spec should not have any side-effects. These are basically GET requests without a "?".
- We also include an X-Forwarded-For header which provides the original IP address in case webmasters want to perform granular geotargeting.

I think using X-Forwarded-For is the usual way that proxies like squid pass on a user's original IP address? So the accelerator is like most caching proxies in that sense. I'll be heading to bed in a little bit, but hopefully I'll have more time tomorrow to post. But I'm happy to relay questions if people want to find out more about how it works.

mrMister

6:48 am on May 6, 2005 (gmt 0)

We only prefetch URLs which according to the HTTP 1.1 spec should not have any side-effects. These are basically GET requests without a "?".

I have to say this sounds a bit errenous. The querystring is not the only way to pass data. Cookies can be used to pass data as well.

If Google are hell-bent on introducing this without allowing webmasters any control over it, then, in order to prevent side-effects, the application should not fetch URLs when a cookie has been set.

I really don't understand why Google are so adamant that webmasters should have no control over the app. The prefetching mechanism is, in effect, a semi-automated web crawler and therefore it should obey robots.txt

Well, at least we know how to block it now. We have to append a "?" to each of the URLs we don't want to be prefetched.

GoogleGuy, do you know if doing this will have adverse side-effects with googlebot?

Powdork

6:55 am on May 6, 2005 (gmt 0)

For me it seems to be slowing firefox way down. However I am on a slow dsl connection. Is there a speed at which it changes from helping to hurting and if so what would that speed be?

dmorison

6:56 am on May 6, 2005 (gmt 0)

- We do add an "X-moz: prefetch" header to all prefetch requests. That way, webmasters can choose to just ignore prefetch requests if they so choose.

What's the best server response to a request containing the X-moz: prefetch header if you choose to "just ignore" it?

I just checked the mod_rewrite docs and couldn't see a rule flag that means "drop silently"; so you have to return _something_. What's the best _something_ to return in order to avoid additional side effects?

The particular side-effect that i'm worrying about is the WA assuming that my attempt to ignore the request is the actual response and return that to the end user....

jd01

7:23 am on May 6, 2005 (gmt 0)

You should be able to return anything you like, as long as you are only denying 'prefetch', because if the user actually clicks on the link, a 'regular' header will be sent and therefore the condition will fail allowing the link to be followed as usual.

Justin

Added: Of course this may *not* help for outbound links.

GoogleGuy

7:34 am on May 6, 2005 (gmt 0)

That's strange, Powdork. It goes slower when you try it out?

Powdork

7:48 am on May 6, 2005 (gmt 0)

Yes. Are you familiar with the galleries generated by photoshop? If so you will recognize the next, previous, and index buttons that are added to each page. I do a site: search that brings up lots of image pages. Then I start visiting pages with the accelerator on and then off. When I visit with the accelerator I visually watch the buttons load as well as the main image. Without the accelerator the buttons snap up and I have to wait a little for the image. I am making sure that each time I am opening a page in a different directory to ensure it is calling a different file for next.gif, etc. The next, previous, and index buttons are placed first in the code.

Causes
1. My DSL is too slow for this to work and it actually has a negative impact. When i signed up for dsl it was very fast. I was near the source. Then I moved to a location further away from the source but still had time on the contract so I couldn't switch to cable, which is much faster at my location. I am probably about 50% faster than a 56k.
2. Since the main links on the page are to other image laden pages, the browser is fetching other images while trying to download and render the ones on the current page. However, I am noticing a general 'slowerness' on other pages I visit, but there are no easy measures of speed like the buttons in my galleries.

mrMister

7:56 am on May 6, 2005 (gmt 0)

That's strange, Powdork. It goes slower when you try it out?

That's hardly strange.

If the web server is closer than Google's proxy in terms of network topology and if the output of the server uses HTTP compression then Google's response will likely be slower than the original web server's response would have been.

Google's proxy isn't like an ISP's proxy that is guaranteed to be "closer". If it's farther away and it can't compress the data considerably then Google's response will be slower.

I know these Google programmers think they're smart, but if they think they can break the laws of physics then they really need to cut down on the amount of caffeine they're consuming! ;-)

dmorison

7:59 am on May 6, 2005 (gmt 0)

You should be able to return anything you like

... but then how does the WA know that that is not the expected response and serve it up in response to the "real" click a few moments later?

I must be missing something here. Nothing I send out in the response, that can be interpreted by a computer program, can imply that the page is any different because of the prefetch header included in the request.

It would need a response header that implies "Prefetch-Denied", otherwise you have to figure out how to drop the connection and break the protocol...

claus

9:24 am on May 6, 2005 (gmt 0)

403 Access Denied

---
And, it's quite easy to make a simple counter. Usually it's one of the first things you learn in any programming language.

"Wow, i am visitor number 1,000,000 to this web site and won a prize"

dmorison

9:33 am on May 6, 2005 (gmt 0)

403 Access Denied

....and then that gets returned to the user in response to their subsequent genuine click!

Surely you're just being hopeful that the developer of an intermediate proxy that is inserting prefetch headers (the WA in this case) is going to asert in their design that if they receive a 403 Access Denied then they should re-request the URL in the future when the genuine click through comes through.

I am not happy making that assumption; because I did not design the WA, and I don't know the person who did.

Regardless, the only thing that could be safely relied upon not to break functionality in response to a request containing a prefetch header is something that specifically references that prefetch request, such as a new response code or prefetch-denied header.

jd01

9:52 am on May 6, 2005 (gmt 0)

Without going too far off topic:

What you see in a browser:
www.yoursite.com

What your server sees in *every* request that is made:
GET [yoursite.com...] HTTP/1.1 (or some other method EG POST, HEAD, etc. and version of HTTP/1.0)

So, since G is sending the 'prefetch' header request only to requests it is prefetching and *not* to links that are being clicked by the user. You can effectively block the prefetch request at the server level with this or something similar:

RewriteEngine On
SetEnvIf X-moz prefetch HAS_X-moz
RewriteCond %{ENV:HAS_X-moz} prefetch
RewriteRule . [F,L]

This will *only* impact a request that contains X-moz: prefetch. It will *not* impact a simple GET or POST request that does not contain the prefetch request in the header. EG A user clicking on a link.

Hope this helps.

Justin

Added: >> and then that gets returned to the user in response to their subsequent genuine click!---This is incorrect.

Edited for clarity.

philaweb

10:39 am on May 6, 2005 (gmt 0)

>So, since G is sending the 'prefetch' header request only to requests it is prefetching and *not* to links that are being clicked by the user. You can effectively block the prefetch request at the server level with this or something similar:<

I have checked and rechecked my analog stats for yesterday, I must admit that I do not see any prefetch requests from the Google Accellerator IP anywhere.

Perhaps Google is going to implement that option later (since the WA is only in BETA mode). Currently, only the IP tells me when the Google WA is at work.

It is interesting to see that WA also prefetches files within .htpasswd protected directories. WA prefetches anything your browser (IE or FF) has access to.

Another interesting pattern: WA downloads all files requested by a visitor, which means double bandwidth usage. It is difficult to see whether the downloaded prefetched files are kept within the WA buffer cache on the visitors PC or kept on a Google server - or both.

Anyways, I'm done fiddling with AW and have decided to delete it from my PC.

What I find confusing is the behaviour after I blocked AW in the server .htaccess file, still using the WA for browsing. Some of the pages turns up anyway, even though I never clicked them using the WA. Whether the page comes from the ordinary browser cache or the Google server prefetch cache is impossible for me to decide.

Some clicks did return a 403 error page, but the URI remained intact in the address bar. When refreshed the page turned up. The only .htaccess code that definitely shut down the WA prefetches was this:

<Limit GET POST>
order allow,deny
allow from all
deny from 72.14.192.6
</Limit>

dmorison

11:00 am on May 6, 2005 (gmt 0)

This will *only* impact a request that contains X-moz: prefetch. It will *not* impact a simple GET or POST request that does not contain the prefetch request in the header. EG A user clicking on a link.

This statement doesn't seem to appreciate what an accelerating proxy server actually does. The whole point of the prefetch is that the users "real" click never makes it to your server.

Anyway I agree that we shouldn't bog this thread down with the technie details so i've tried to clarify my concerns in a separate thread...
[webmasterworld.com...]

jd01

11:31 am on May 6, 2005 (gmt 0)

Last comment and then I'll go back to the threads I normally post in...

My understanding is: The prefetching is caching a 'live' page from your server on-the-fly (EG loading the page into a cache at G) - If you deny the request for the prefetch, the page will not be cached. If the page is not cached all link(s) will continue to have their normal function (IOW the links will not be broken, nor will the pages cease to function, because you denied the precaching... Just like any other link on the page that is not cached.) So, by denying the proxy request for the pre-caching, you are effectively turning off the pre-caching engine, and forcing the requests to be processed as normal by your server.

Justin

BTW I believe since the 'prefetch' header sent is that of a proxy function it will not show in your logs as a normal GET/POST header would... It should only show that the page was loaded by the G proxy IP address.

Anyway hope this help someone...

oneguy

12:49 pm on May 6, 2005 (gmt 0)

from Scarecrow...

With WA, and the abiding faith that any WA user would necessarily have in Google as God, the mess-ups from Google will get blamed on us webmasters.

True, and Google cares not.

from mrMister...

I really don't understand why Google are so adamant that webmasters should have no control over the app.

I remember a few years ago, when content building webmasters would sing in unison about their partnership with Google. It has to be getting harder and harder to pretend.

(and that's not directed at anyone personally... I just jumped off your quote.)

Angonasec

1:09 pm on May 6, 2005 (gmt 0)

Ta Claus: G WA is now blocked in my account root htaccess using the code you gave.

I asked before, but you missed it...

I'm on a shared server so will it stop prefetching of my account. (I don't control the server htaccess).

Bill asked:

"Why block WA?

Okay Bill, please post the relevant compact htaccess code to redirect safely, together with your sample 'Google WA tutorial for the duped', and we will use it, until Google give us back our liberty.

reseller

1:53 pm on May 6, 2005 (gmt 0)

Angonasec

Liberty isn´t something somebody gives you, its something you fight for and earn! Also true in Google case. Its publishers who should fight back for their liberty.

History repeat it selv. The present situation with Google´s WA remindes me of a situation in 2001 where the great legend, Jim Wilson gathered hundreds of decent webmasters and lead the fight against scumware.
[scumware.com...]

Who knows who will be the next Jim to lead us all as publishers in our fight for privacy and control of our own contents.

This 476 message thread spans 20 pages: 476