Welcome to WebmasterWorld Guest from 3.209.80.87

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Add gzip compression - urls now fully indexed

     
7:46 am on Apr 4, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


I know there are dangers in assuming that event #1 is the cause, just because event #2 follows after it. But let me tell you the story.

For almost a year a client of mine struggles with a small ecommerce site. 10 out of 350 pages were fully indexed and all the the rest were just the url (we know you're there, but we haven't had a good look yet!)

After fixing everything I could think of - eliminating session cookies, putting monster javascript in external files, making sure every page had straight html links, no duplicate urls leading to the same resources, etc - I just let it go.

But then one day while examining the HTTP headers I noticed that the server was not using any compression, and that seemed like the right thing to do for the end user. So I explained to the server admin how to turn on compression in IIS 6, and he did. It's now a few later, the pages have been re-spidered, and almost all the urls are now "fully" indexed.

It was nearly a year of not being in the index, and then here we are. The pages were around 40-50kb uncompressed. I think there may be something to this - but I can't for the life of me understand what it would be. Still, if anyone is struggling in a similar fashion, what the heck it's worth a try!

9:14 am on Apr 4, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Dec 7, 2001
posts:579
votes: 0


"The pages were around 40-50kb uncompressed"

So what size did they end up when compressed?

9:27 am on Apr 4, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


I don't even know how to measure that precisely - I'm just a marketing guy playing catch-up with the tech world! We turned on compression to the default - I think that's about 70% on most text files - so 50kb turns into about 15kb of outbound packets.

My gut feeling was that their server was very slow sending out the second and especially the third uncompressed packet - and that slowness often lead to incomplete page downloads. It sure looked that way from my browser's viewpoint, anyway. So getting the whole page delivered in fewer packets could have made a difference for any user agent that can handle compression - including Googlebot who is a very, very busy UA with no time to waste.

At any rate, whether this was the magic that turned the trick or not, the trick has been turned. And this is the only difference I can see - all the other changes were made two updates back.

10:59 am on Apr 4, 2005 (gmt 0)

Senior Member from HK 

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 14, 2002
posts:2301
votes: 20


Ted,

We have mod_gzip on all our servers and while looking at the logs I've noticed that Mr Bot gets the uncompressed content. They have in the past run some experimental crawls ..

[webmasterworld.com...]

I've rechecked my logs from today ..

12:55 pm on Apr 4, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member billys is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 1, 2004
posts:3181
votes: 0


Slurp will grab compressed files, Googlebot 2.1 does not.

If however, you get a visit from the Mozilla/5.0 Googlebot (whatever that spider is used for...), that spider does grab compressed files.

1:53 pm on Apr 4, 2005 (gmt 0)

Senior Member from MT 

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 1, 2003
posts:1843
votes: 0


Speeding up server response times has often resulted in faster and more complete spidering by google.

SN

9:27 pm on Apr 4, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 13, 2004
posts:833
votes: 12


Googlebot seems to be doing much more extensive crawling very recently grabbing every page multiple times, perhaps this is related to your recent success.

I've used GZIP for quite some time and as BillyS says Google is very rarely grabbing compressed pages. It appears to be hitting the root or home page only, and only occaisionally, for some reason. Only probably 6% of the webhosts out there support GZIP compression so it doesn't benefit Google much to crawl asking for compressed pages.

At the "lowest" compression setting GZIP seems to reduce page size by a factor of 4, probably more if you embed repetitive stuff like fonts and styles.

If nothing else you sure are helping your website visitors tremendously, especially the most common users which are still 56K modem users!

Google may keep statistics on page size versus a searchers "return to Google" results time. Far fetched but maybe Google figured out a lot of users were returning to the search results before the page could have possibly loaded. Probably a crazy thought, more likely is the recent more intensive crawls.

Finally it seems like recently many, many, scraper sites results are now "Supplimental". Google may have put a more reasonable emphasis on large, content filled, pages like it always should have!