Forum Moderators: open
For more info and other utilities:
[webcompression.org...]
For the Apache server, look at version 2.0, version 1.3 can compress but its tough, look at documentation:
[httpd.apache.org...]
This is the 4th(?) time that Google has tested Gzip?
[webmasterworld.com...]
[webmasterworld.com...]
[webmasterworld.com...]
[webmasterworld.com...]
Internet Explorer and Netscape do request compressed pages, unless you're like me and have N....n Internet Security 2003 which just blantantly turns this capability off. If you temporarily disable Internet security you can see that your browser will typically request compressed pages. At the bottom of the page at the www.webcompression.org site the page will let you know if your browser is requesting compressed pages or not.
Compression is yet another way to spam Google because you can dynamically serve different content based upon the content of the request, but, all Google has to do is randomly produce requests for uncompressed pages, compare the results to the compressed page to detect spam, so perhaps their getting around to it. By quadrupling their available network and server bandwidth they can spend a lot more time detecting spam without loading our servers, their servers, or the Internet itself. This also leaves more CPU time for Pagerank calculations.
Perhaps this is their approach to correcting Page Rank, get rid of the abusers and maybe it will start to work again. They do seem to be doing a lot of banning recently, this may be an enhancement that allows for a totally automated approach.
Certainly with their influx of funds it makes sense to have a full duplicate set of bot servers, they can hide the "dance" completely simply by toggling between server sets which can be done very quickly. Build a new index and "switch".
I've seen requests from Google with no agent strings at all, but I think this was from the image bots, but others have reported Google IP's with no user agent. With only a few unknown or secret IP's and random sampling Google could have eliminated (detected) cloaking quite a while ago, it may just have been a matter of funding or bandwidth, who knows? They seem to be tackling bandwidth.
Apache 2.0 dynamically compresses pages on the fly which is very convenient, no extra work, except for the CPU, but the CPU is off loaded by the reduction in bytes to move. Unix is renown for this inefficiency; moving bytes.
I think the next few weeks (or months) are going to be interesting!
We will see MANY new things from Google, so the wall in the secret bathhroom has been telling me.
While in Europe I heard from the largest internet marketing firm there that there are at least 160 more things being worked on than the public version of Google Labs tells us.
FYI - Keep an eye on Google...
Hollywood
I run a web hosting company. If I enable gzip support, here is what happens:
1) The load on my server goes up due to the increased CPU time necessary for page compression.
2) The bandwidth my clients use goes down.
Both of these things are negatives... where are the benefits?
no extra work, except for the CPU, but the CPU is off loaded by the reduction in bytes to move. Unix is renown for this inefficiency; moving bytes.
GZIP has a sliding scale of compression from 1 (least compression) to 9 (most). You can pick the balance between reducing bandwidth usage and minimising CPU usage. Once you get passed 3 or 4 you tend to suffer from the law of diminishing returns. If you use content-negotiation with static files, you can compress the files once, and serve them many times, getting the best of both worlds.
To maintain software protection boundaries between drivers, communications software, and applications software, messages that arrive or are sent are copied numerous times by the operating system. The TCP protocol does a checksum on every message even though communications hardware is doing a CRC in hardware. By reducing the number of messages that must be sent, the number of memory copies an operating system must do is reduced. The number of checksums calculated is reduced. The number of interupts and context switches to move messages is reduced. All this can easily compensate for one new algorythm that actually does the compression. One must look at the operating system, communications system and Application CPU usage to get the whole story.
I'm considering statically compressed pages (content negotiation) because one of my webhosts does not support compression, even mod_gzip in Apache 1.3, but now I lose server side includes, etc, which are very convenient. It adds quite a maintainence burden for the website owner and finally Google may still look upon it as a potential source of spam. Dynamic, on demand compression, could make Google feel safer, I don't know how they would tell the difference though.
So to be brief total CPU usage shouldn't go up. I can see a small memory usage increase.
Regarding bandwidth per client going down; doesn't that mean more clients per server, less communications hardware, etc? I can see the revenue effect, but that can be fixed.
The benefits:
My customer using a 56K modem sees my webpages in one fourth the time (hopefully), that means more revenues for me, and then maybe I'll buy more webhost space! Of course now ISP's are the doing the compression (accelerators), but that doesn't help the Google crawl.
Even better Google can crawl my site 4 times as often, getting my new information and pages indexed much sooner (time to market!).
I'm sure many servers are serving compressed pages, but until now not to Googlebot.
Why on Earth would webserver check useragent before deciding if it is going to serve compressed content?
Its the client's responsibility to declare what encoding it supports, by sending Accept-Encoding headers with value such as "gzip,deflate", while its up to server to ultimately decide if it wants to server compressed content or not, there is no reason why it should even bother checking useragent.
The more bots and sites support compressed transfers the better for everyone - apart from hosting companies that charge for bandwidth. If it was all up to these companies we'd still be in age of pay per minute connectivity, thankfully its not the case in most parts of the developed world.
Why on Earth would webserver check useragent before deciding if it is going to serve compressed content?
I'm sure many servers are serving compressed pages, but until now not to Googlebot.Why on Earth would webserver check useragent before deciding if it is going to serve compressed content.
I guess I didn't explain it very well. Googlebot, as our web servers client, has been using HTTP 1.0 in its request. The Googlebot request does not include a request for a compressed page.
On Sept 28th or so, Googlebot's request used HTTP 1.1 and it was (finally) requesting GZIP compressed content. Google appears to be using a new set of servers (Googlebots) to make this happen.
If nothing else Google can use this fast compressed crawl to check for spamming in many ways and still retain their old HTTP 1.0 crawl. The user agent on this crawl was "Mozilla 5.0" as well, not Googlebot, but Google still included the link to the Googlebot info. I've read many articles wondering why Googlebot isn't requesting compressed content and they do seem to be testing it again. Google will have to always request some compressed pages and request uncompressed pages to check for cloaking that's taking advantage of compression. This may be why it has taken them so long to start using compression.
One of my webhosts doesn't provide GZIP compression at all, I hope my Pagerank doesn't go down in the future!