homepage Welcome to WebmasterWorld Guest from 50.17.21.7
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Local / WebmasterWorld Community Center
Forum Library, Charter, Moderators: lawman

WebmasterWorld Community Center Forum

This 43 message thread spans 2 pages: 43 ( [1] 2 > >     
Why doesn't WW use GZIP compression
4X faster page loads and lttle CPU penality
bumpski




msg:522714
 10:17 am on Aug 7, 2005 (gmt 0)

Why doesn't WW support GZIP compression?

If WW (Webmasterworld) supported just Level 1 GZIP compression on its servers, 56K modem users would have 4 times faster page loads. The webmasterworld servers communications load would be cut by a factor of 4. System CPU usage would drop, while application CPU usage would increase. At level one GZIP compression the savings due to reduced bytes to handle, far fewer context switches per message, etc, would balance with application CPU usage increase. It seems like a win win.

Vastly better performance for end users!

 

trillianjedi




msg:522715
 10:29 am on Aug 7, 2005 (gmt 0)

Brett's pretty switched on with this stuff, so I suspect there's a very good reason for not using it.

I'm interested in knowing what it is though. I use it on one of my busier websites, and it has helped.

In what circumstances is not using gzip actually better than using it?

TJ

bumpski




msg:522716
 11:27 am on Aug 7, 2005 (gmt 0)

One reason many web hosts don't offer it (up front) is is cuts bandwidth usage by a factor of 4.

Bandwidth = Revenues.

They could solve this by having two fee rates, but this seems too complicated for some reason

Some Webhosts still argue CPU usage goes up. Some of this thinking comes from selecting too high a level of compression, level 1 is the minimum and typically provides 4X compression, no need to go higher, it just burns CPU for a few percent more savings. Another missuderstanding is looking at application CPU usage versus system CPU usage. Application CPU usage goes up but kernel CPU usage goes down.

Memory usage on the servers does go up some.

Believe or not if you use Norton Internet Security, NIS actually "turns off" GZIP compression for your web browsers. If you turn off NIS GZIP will kick in. If you use Live HTTP headers in firefox you can see these GZIP flags in the request.
( Google GZIP's it's SERPS! )
Compression is optional and the requesting browser sends a flag saying GZIP it please! All modern browsers request compressed pages, which the webhost may optionally return.

Google is crawling with GZIPped requests more and more frequently. A most recent crawl of one of my sites used 100% GZIP requests by Google! Imagine the internet bandwidth savings! 4X faster Google crawls. Only about 6% webhosts support GZIP. Unfortunately GZIP is yet another way to spam Google.

It's not good to compress all file types, only html. Some browers have trouble if a PDF file is compressed (IE).

outrun




msg:522717
 11:42 am on Aug 7, 2005 (gmt 0)

I am no expert with gzip, but in message 47 Brett answers this question

[webmasterworld.com...]

bumpski




msg:522718
 12:31 pm on Aug 7, 2005 (gmt 0)

I looked hard for that answer, but didn't include Brett's name in my Google searchs. Google must be updating WW daily, that's a lot of unzipped crawling!

Of course my search results came back very quickly because Google is GZIPping and has been for at least a year and a half.

It seems like Google would love to crawl compressed content but web hosts are mainly looking for that bandwidth dollar sign. They actually have no incentive, because most of their customers don't know about the feature. Of course their customer's customers would love to have web pages opening 3 to 4 X faster on their 56K modems. (And maybe we wouldn't have all these caching ISPs messing up our webpage presentations, and our page hit statistics)

Even my main webhost does not run GZIP but was willing to say "oh by the way" if you do this weird thing with PHP you'll have GZIP. Then I had to manually handle the missing page error page problem, which is important.

Google could crawl WW about 4X faster, Google only reads the html files, no CSS files.

I think these failed experiments of the past came down to using too high a compression level. The most rudimentary level is the best performance balance!

Granted it's always a big experiment. But I do love seeing pages loading fast for all my visitors!

Brett_Tabke




msg:522719
 4:24 pm on Aug 12, 2005 (gmt 0)

Yes, Gzip is installed on the host, but we do not take advantage of it at this time.

>Google could crawl WW about 4X faster

That is one very good reason I do consider going to Gzip. I totally salute the idea of using Gzip - I really wish we could make it work here for users.

Some of the this issues:

System overhead: The page must be generated in perl within memory and then compressed by GZip before beginning to transmit the page to the user. This can add a significant amount of time to page generation.

In the current system, the First Part of page has started to be delivered to the user before last part of page generated.
The BestBBS software generates pages on the fly. By the time this sentence is put in your browser, the above part of this page
is already generated and out the door. The last half of this page has yet to be determined. The software has no idea what the next message
is, or what kind of code is going to be needed to be generated. There may be some files yet to be opened - seeks to be made, and user files
to be updated. All that takes a significant amount of additional time. All that time would be "at the top of the page" if we used Gzip.

For example (numbers are all relative guesses):

NonGzip:

Time Slices: Action
0.5 find required files.
0.5 find user files.
0.5 generate header.
0.1 send out top of page.
0.5 find post
0.5 generate start of post.
0.1 send out top of post.
0.5 find posters user files.
...

Total time slices or approximate time to generate message to browser: 4
Time to put something Visible In Browser: 1.5

With GZip

Time Slices: Action
0.5 find required files.
0.5 find user files.
0.5 generate header.
0.5 find post
0.5 generate start of post.
0.5 find posters user files.
1.0 Gzip post.
x.x send out page.

Total time slices or approximate time to generate page to browser: 5
Time to put something Visible In Browser: 5

We go from 1.5 to 5 to put something in your browser. So if a page has 4 seconds of page generation time, we are spreading that generation out over the life of page delivery. Where as with Gzip, the entire 4 seconds of page generation is at the top of the page. The moral is that, in a dynamic environment, the slow part is not page delivery, but page generation.

What we are doing here, is taking advantage of network latency. We can pump out the code on to the web - which is really going to run through a massive set of routers and switches (hey, you've done a tracert right?). Apache doesn't set there and wait for that code to show up in your browser. It sends it out on it's merry way, and then gets back to the rest of the code, while your isp net work does the rest. That is twice as true if you are on a isp that uses transparent caching such as AOL. So, while the first 10k of a page is being delivered in your browser, the next set of 20k is being worked on by Apache.

Add in the additional "Gzip" overhead (it is not massive load, but it becomes significant in such a page view happy environment).

Gzip on a system like this In the real world? It looks like Lag in the browser. It is a delay in the browser before it does something and starts to move. Thus, we would get complaints about why the slow down in WebmasterWorld?

To my way of thinking, I want to make that little download indicator do something in your browser as fast as possible. I removed every last bit of overhead code that I can from BestBBS before that first byte is generated. I do everything I can to generate and start sending that first 1.5 to 4k (a general size of most network buffers) as fast as possible. It is the key to the constant stream of comments that BestBBS is the fastest forum software on the web today. We may only be in the top 300 Alexa sites, but I would put our "byte per box" ration up against anyone on the web today - no one gets more bang per box than we do. Other comparable forums are using networks of 8 servers to our 1.

amznVibe




msg:522720
 5:29 pm on Aug 12, 2005 (gmt 0)

Good luck on this argument - I suggested mod_gzip [schroepl.net] a year or more ago with great detail and examples at the time too.

Assuming the forums use PHP as the underlying code, pick one forum as a testing ground for the process.
Then at the start of the page generation for that forum use this simple function call to simulate mod_gzip:

ob_start ("ob_gzhandler");

or try a fancier version which is more powerful and can use deflate or gzip
depending on what the browser will support:

$PREFER_DEFLATE = true; // prefer deflate over gzip when both are supported
$FORCE_COMPRESSION = false; // force compression even when client does not report support
function compress_output_gzip($output) {return gzencode($output);}
function compress_output_deflate($output) {return gzdeflate($output, 4);}
if(isset($_SERVER['HTTP_ACCEPT_ENCODING'])) {$AE = $_SERVER['HTTP_ACCEPT_ENCODING'];} else {$AE = $_SERVER['HTTP_TE'];}
$support_gzip = (strpos($AE, 'gzip')!== FALSE) $FORCE_COMPRESSION;
$support_deflate = (strpos($AE, 'deflate')!== FALSE) $FORCE_COMPRESSION;
if ($support_gzip && $support_deflate) {$support_deflate = $PREFER_DEFLATE;}
if ($support_deflate) {header("Content-Encoding: deflate"); ob_start("compress_output_deflate");}
else{if ($support_gzip){header("Content-Encoding: gzip"); ob_start("compress_output_gzip");}
else {ob_start();}}

Then benchmark away.
Note that given the flexibility of the user preferences in this forum, Brett could default to no compression and let the more advanced users decide to turn it on for themselves or not by using the above code.

Of course there is a huge list of massive websites that use mod_gzip.
I noticed even CNN.com turned it on recently.

macdave




msg:522721
 5:51 pm on Aug 12, 2005 (gmt 0)

I ran gzip for a while on a moderately busy site...the speed gains (actual and apparent) in our case were nice, but a number of users behind proxies started having problems where they couldn't get pages to load at all. Turns out that certain caching proxy servers accept gzipped content but cache it incorrectly.

Brett's points are well taken. gzip can be beneficial for sites that serve mostly static files, but it may actually hinder dynamic sites that require a lot of processing to generate each page.

amznVibe




msg:522722
 5:54 pm on Aug 12, 2005 (gmt 0)

Hence my idea of making it optional per user in the control panel and turning it off by default.
Let the more advanced users turn it on if they want it. Proxy "problem" solved.

CNN.com front page example:
Gzipped 13280 bytes -vs- Original 60018 bytes
28.8k modem time: 6.5 seconds vs 29.3 seconds
DSL/Cable time: 1.3 seconds vs 5.9 seconds

So there is 1-4 extra seconds spared by the compression given over to page rendering even for broadband users.

Brett_Tabke




msg:522723
 6:23 pm on Aug 12, 2005 (gmt 0)

> in php

nah - perl all the way.

> Then benchmark away.

It is a flip of a switch. The system hangs under load within 5 mins.

> CNN.com front page example:
> Gzipped 13280 bytes -vs- Original 60018 bytes
> 28.8k modem time: 6.5 seconds vs 29.3 seconds
> DSL/Cable time: 1.3 seconds vs 5.9 seconds

Good figures, but not the full picture.

FBIB (first byte in browser) is the real test. Non Gzip will always win.

markus007




msg:522724
 6:27 pm on Aug 12, 2005 (gmt 0)

I'm using IIS 6.0 on a dual opteron 1.8ghz.

6 million pageviews a day with gzip turned on.
CPU usage never goes over 20%.
Serving 30mb/sec after compression is turned on.
Response time is nearly instant.

aspx files are compressed on the fly, static css, javascript etc files are saved zipped up to the cache folder in IIS.

Namaste




msg:522725
 6:39 pm on Aug 12, 2005 (gmt 0)

so that's why we are experiencing "lag".

We switched our services from IIS to Apache (also switched from ASP to PHP) and starting using mod_gzip, whereas in IIS it was "ticked"... we find that in Apache there is a 2 second "lag" before the page loads, which wasn't there with IIS.

Can someone suggest ways to overcome this lag on Apache and get the "FBIB (first byte in browser)" instantly.

[edited by: Namaste at 6:43 pm (utc) on Aug. 12, 2005]

zoltan




msg:522726
 6:40 pm on Aug 12, 2005 (gmt 0)

Brett, may I ask if WebmasterWorld use static pages (generated every time after a new post is added) or use dynamic pages with mod rewrite to look like static pages?

markus007




msg:522727
 6:52 pm on Aug 12, 2005 (gmt 0)

Namaste, you have to hack the IIS metabase to turn on compression for asp and aspx files. By default only static files are compressed when its turned on.

freeflight2




msg:522728
 8:26 pm on Aug 12, 2005 (gmt 0)

gzip pages definitely _decrease_ the load on heavily used modern webservers (dual 2.8+, 100+ pages/sec - make sure to *not* use a gzip compression > 2 or it will get slow) - the less packets to be sent out the better (that takes more CPU than doing compression). # of open connections also do become a problem sooner or later on unziped pages.
I would run BestBBS either as a daemon or in mod_perl with most data being cached directly in memory and write-thru to disk. A proxy to handle all non logged in users and SEs can do wonders, too.

Brett_Tabke




msg:522729
 8:40 pm on Aug 12, 2005 (gmt 0)

> there is a 2 second "lag" before the page loads,

Yes. That is the issue.

> WebmasterWorld use static pages
> (generated every time after a new post is added)

Is your name at the top of the screen? Can you see a stickymail link? Log out and compare the pages to when you are logged in.

Yes, 100% dynamically generated site.

> or in mod_perl

I've been running some tests here and there. I think I might ramp that up and give a mod_perl a shot full time - it does look impressive on some tests. (I'd never used it before)

bumpski




msg:522730
 8:45 pm on Aug 12, 2005 (gmt 0)

Brett

Since I kicked off this thread thanks for the detailed reply. Very informative!

I realize Webmasterworld has been hit with this question a "few" times in the past. I do agree with delivering some visible content to the site visitor as soon as possible. I stress, on my sites, the informational content be shown as quickly as possible and with stability. It doesn't "move" as other content arrives.

I also agree that webmasterworld's performance is quite good. There are so many other sites that could easily implement GZIP compression, but don't even know it exists, that's the unfortunate thing. I just wish it were pushed more!

I've used IFrames to support a lot of asynchronicity of content I know will be delayed. I realize this is a problem with older browsers but I've had no complaints.

Regardless WW performs well and provides an important service. For speed I'm lucky to finally have DSL, quite an improvement over 56K!

Darkness




msg:522731
 8:58 pm on Aug 12, 2005 (gmt 0)

Compression is definitely the way to go for text heavy pages, I noticed that amazon is using it now, I page I just checked was reduced from 133k to 30k.

I am using mod_deflate which comes as default with apache 2 and there is certainly no 2 second delay, maybe mod_gzip is slower?

chaitan




msg:522732
 11:12 pm on Aug 12, 2005 (gmt 0)

Base on the test result of
[mozilla.org ]

and my own benchmark testing using apache ab,
I believe mod_gzip is only good for *slow* connection clients. mod_gzip slow down system performance, no matter static or dynamic page.

Here is one of my static page result, test under a new empty dedicated server:
mean time per request is 38ms with gzip on, vs 5.x ms without mod_gzip

GZIP ON: ($APACHE/bin/ab -H "Accept-Encoding: gzip" -n 1000 -c 20 [mysite.com)...]

Document Path: /file.html
Document Length: 6197 bytes

Concurrency Level: 20
Time taken for tests: 1.945 seconds
Complete requests: 1000
Failed requests: 0
Broken pipe errors: 0
Total transferred: 6563832 bytes
HTML transferred: 6271364 bytes
Requests per second: 514.14 [#/sec] (mean)
Time per request: 38.90 [ms] (mean)
Time per request: 1.95 [ms] (mean, across all concurrent requests)
Transfer rate: 3374.72 [Kbytes/sec] received

GZIP off: ($APACHE/bin/ab -n 1000 -c 20 [mysite.com)...]

Document Path: /file.html
Document Length: 19795 bytes

Concurrency Level: 20
Time taken for tests: 0.290 seconds
Complete requests: 1000
Failed requests: 0
Broken pipe errors: 0
Total transferred: 20295216 bytes
HTML transferred: 20025758 bytes
Requests per second: 3448.28 [#/sec] (mean)
Time per request: 5.80 [ms] (mean)
Time per request: 0.29 [ms] (mean, across all concurrent requests)
Transfer rate: 69983.50 [Kbytes/sec] received

I don't think I will turn on my mod_gzip unless most of my users have slow connection. I will not do it for search engine bots, they are not slow, mod_gzip will do nothing but slower to feed googlebot.

lovethecoast




msg:522733
 12:59 am on Aug 13, 2005 (gmt 0)

Our experience with our paltry 5 million or so pageviews a day is gzip is most definitely worth it on text heavy pages. In addition to gzip, we also remove white space, which cuts a few k off the size of the page as well.

For us (IIS/XCache), we see about 1/10th of a second of load added. Considering this 1/10th of a second is saved in page load time for the user, (even with a DSL or Cable connection -- about 80% of our visitors), the decision is easy for us. We very rarely go above 30% utilization on our servers, although our servers tend to be pretty high end (multi-processor, 4+ gigs of ram).

S

amznVibe




msg:522734
 7:29 am on Aug 13, 2005 (gmt 0)

I do not see a 2 second lag on sites using gzip properly.
Try
[slashdot.org...]
[cnn.com...]
etc.

As far as FBIB you can send all the headers you want before gzip data but technically nothing else.

One side benefit of gzip is that images are forced to load AFTER the html is recieved and not during page load. So you see text first always. Without compression the browser can open multiple pipelines to the server as soon as it sees the first image tag, thereby slowing down the loading of text!

FBIB is a nice sounding benchmark but it's far from the whole picture.

IBM whitepaper on html compression:
[ibm.com...]

[edited by: Hawkgirl at 12:02 pm (utc) on Aug. 13, 2005]
[edit reason] linked URLs [/edit]

klauslovgreen




msg:522735
 9:48 am on Aug 13, 2005 (gmt 0)

Brett,

You may consider caching everything 15 or 30 minutes and serve cached gzipped pages to those not logged in and dynamic pages to those logged in.

We are using that method on a news site and unless you are into breaking news 15 minutes is not much delay - if nothing else it might drive more registered users ;-)

Just a thought

Cheers
Klaus

bumpski




msg:522736
 10:01 am on Aug 13, 2005 (gmt 0)

It truly is unfortunate the default compression level of zlib was set to 6 instead of 1. Level 1, provides 95% of the compression gain. Levels 2 and above will gain another 5% and just use up CPU.

Unfortunately the defaults have produced very misleading evaluations of compression's "system" performance.

And when considering performance the whole "system" should be included.

Below a trace to www.google.com. Of course yours may be a little, but only a little, shorter.

Think about 4X as many bytes needlessly traveling through all these devices.

I think Googlebot will be faster with servers providing compression! Why else would they keep experimenting with it.

The trouble is only a few servers return compressed results, and unfortunately it is yet another way to spam a search engine, so Google has to keep the world honest and will be forced to occaisionally crawl requesting uncompressed web page contents.

1,192.168.254.254,2ms,,0
2,205.238.---.--,37ms,,0
3,66.33.229.137,32ms,,0
4,66.33.229.146,32ms,pos3-1-lkst7613.epix.net,0
5,12.126.174.121,56ms,,0
6,12.123.9.70,59ms,gbr6-p30.wswdc.ip.att.net,0
7,12.122.11.189,62ms,tbr2-p013701.wswdc.ip.att.net,0
8,12.122.80.222,75ms,,0
9,192.205.32.42,53ms,att-gw.washdc.level3.net,0
10,4.68.121.178,54ms,ae-21-56.car1.Washington1.Level3.net,0
11,4.79.228.26,54ms,,0
12,216.239.47.158,55ms,,0 Google
13,216.239.48.198,55ms,,0 Google
14,64.233.161.99,56ms,,0 Google

Again keep up the good work WW!

gmiller




msg:522737
 10:06 am on Aug 13, 2005 (gmt 0)

Actually, a browser could still begin sending off the image requests before the gzipped document is fully loaded, provided the browser's handling of compressed content isn't flat-out stupid. As far as I remember, gzip uses a Lempel-Ziv variant, so there aren't any forward references in the compressed data. That being said, I haven't surveyed the browsers out there to see how good they are at streaming gzipped data to their rendering pipelines.

Another issue to consider is paint suppression. Many (if not all) browsers only begin to render a page after enough time has passed or enough data has been loaded. Otherwise, you end up with jittery displays and slow page loads. Early versions of Mozilla tried to render data as soon as it came in, with painful results.

I guess the moral of the story here is to avoid generating pages on the fly. Sadly, that approach isn't free either, since it's hard to pregenerate your pages and provide lots of user customization at the same time. TANSTAAFL.

Brett_Tabke




msg:522738
 6:34 pm on Aug 13, 2005 (gmt 0)

One thing we should also keep in mind, is the mass majority of this audience is on high speed net connections. We run about 80% on dsl/cable or other high speed. We even have about 10% on t1's or higher here.

instinct




msg:522739
 10:30 pm on Aug 13, 2005 (gmt 0)

Drupal CMS uses an interesting approach that seems to balance well:

-With caching enabled, pages are cached in the database compressed.

-Only non logged-in users are served cached pages. If you have the odd user having problems because of his ISP's proxy, simply tell them to create a free account.

Also interesting is that Wikipedia (possibly the most text-intensive site on the web?) uses Gzip compression.
Page generation on Wikipedia is of course dynamic, and is fairly cpu-intensive although I think there is squid caching involved. Anyone know?

Out of curiosity, is there a way to detect the end user connection speed in PHP? With this info, could you not then decide per user whether or not to serve gzipped pages?

amznVibe




msg:522740
 4:14 am on Aug 14, 2005 (gmt 0)

It truly is unfortunate the default compression level of zlib was set to 6 instead of 1. Level 1, provides 95% of the compression gain. Levels 2 and above will gain another 5% and just use up CPU.

mod_gzip will be enhanced in the future to allow a user defined compression level but yes it defaults to level 6 right now because it is hard-coded by gz1->level = 6 inside function gz1_init.

They did a bunch of testing to determine cpu vs compression and found level 6 best. My own basic tests show they were right (and cpu's are only faster now).

Actually, a browser could still begin sending off the image requests before the gzipped document is fully loaded

You are right, technically they *could* but from what I have observed (unscientifically) they don't.

zoltan




msg:522741
 7:45 am on Aug 14, 2005 (gmt 0)

We have just enabled gzip compression (about 24 hours ago) and the results are remarkable. Almost every page is loading a lot more faster, except the homepage (we disabled gzip for the homepage).
If we are talking about FBIB, and compare the homepage with gzip or without gzip:
- with gzip: about 4 seconds delay until FBIB, after that, it is really fast
- without gzip: FBIB loads instantly, then page is loading slower.

We are on a fully dynamic environment (Apache/2.0.46 (Red Hat), Perl, MySQL). Any comment is welcome.

We are also considering to provide cached pages to non logged in visitors. Any suggestions about possible implementations here? We were reading about mod_cache, anyone used it?

paulroberts3000




msg:522742
 10:16 am on Aug 14, 2005 (gmt 0)

i just did a test at WhatsMyIP.org with their mod_gzip Test

it reports for www.webmasterworld.com

"Page Size: 43 K
Size if Gzipped: 8 K
Potential Savings: 81.4%"

How about gziping the content and doing some buffer flushing as the page is generated, so you send the page in chunks that are gziped

also as your using a table based layout doesn't the browser wait for the whole table to load anyway.

ronburk




msg:522743
 4:20 am on Aug 15, 2005 (gmt 0)

So there is 1-4 extra seconds spared by the compression given over to page rendering even for broadband users.

Discussions of this ilk invariably oversimplify the problem. How many machines between my machine and Webmasterworld are already doing compression? That results in machines wasting CPU cycles trying to compress something that is now already compressed. In fact, the machines you most imagine will benefit from the compression (dial-ups) are the most likely to already have hard-wired compression running in the conversation.

And, of course, deciding that compression is "better" by benchmarking the server doing the compressing is like choosing your house's paint scheme by taking LSD -- you can get a warm fuzzy feeling, but you're totally divorced from reality. What really matters is what the users are seeing, which often is not benefitted by what benefits your server.

One of the few folks who can really benchmark network problems of this complexity is IBM. They do it with a huge facility that can be configured to place a great many real machines in a great many real-life network topologies. The history of web server design is littered with good-looking server benchmarks that weren't nearly so good for the HTTP clients on the other end.

This 43 message thread spans 2 pages: 43 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Local / WebmasterWorld Community Center
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved