Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Add gzip compression - urls now fully indexed

         

tedster

7:46 am on Apr 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I know there are dangers in assuming that event #1 is the cause, just because event #2 follows after it. But let me tell you the story.

For almost a year a client of mine struggles with a small ecommerce site. 10 out of 350 pages were fully indexed and all the the rest were just the url (we know you're there, but we haven't had a good look yet!)

After fixing everything I could think of - eliminating session cookies, putting monster javascript in external files, making sure every page had straight html links, no duplicate urls leading to the same resources, etc - I just let it go.

But then one day while examining the HTTP headers I noticed that the server was not using any compression, and that seemed like the right thing to do for the end user. So I explained to the server admin how to turn on compression in IIS 6, and he did. It's now a few later, the pages have been re-spidered, and almost all the urls are now "fully" indexed.

It was nearly a year of not being in the index, and then here we are. The pages were around 40-50kb uncompressed. I think there may be something to this - but I can't for the life of me understand what it would be. Still, if anyone is struggling in a similar fashion, what the heck it's worth a try!

piskie

9:14 am on Apr 4, 2005 (gmt 0)

10+ Year Member



"The pages were around 40-50kb uncompressed"

So what size did they end up when compressed?

tedster

9:27 am on Apr 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I don't even know how to measure that precisely - I'm just a marketing guy playing catch-up with the tech world! We turned on compression to the default - I think that's about 70% on most text files - so 50kb turns into about 15kb of outbound packets.

My gut feeling was that their server was very slow sending out the second and especially the third uncompressed packet - and that slowness often lead to incomplete page downloads. It sure looked that way from my browser's viewpoint, anyway. So getting the whole page delivered in fewer packets could have made a difference for any user agent that can handle compression - including Googlebot who is a very, very busy UA with no time to waste.

At any rate, whether this was the magic that turned the trick or not, the trick has been turned. And this is the only difference I can see - all the other changes were made two updates back.

shri

10:59 am on Apr 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ted,

We have mod_gzip on all our servers and while looking at the logs I've noticed that Mr Bot gets the uncompressed content. They have in the past run some experimental crawls ..

[webmasterworld.com...]

I've rechecked my logs from today ..

BillyS

12:55 pm on Apr 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Slurp will grab compressed files, Googlebot 2.1 does not.

If however, you get a visit from the Mozilla/5.0 Googlebot (whatever that spider is used for...), that spider does grab compressed files.

killroy

1:53 pm on Apr 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Speeding up server response times has often resulted in faster and more complete spidering by google.

SN

bumpski

9:27 pm on Apr 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Googlebot seems to be doing much more extensive crawling very recently grabbing every page multiple times, perhaps this is related to your recent success.

I've used GZIP for quite some time and as BillyS says Google is very rarely grabbing compressed pages. It appears to be hitting the root or home page only, and only occaisionally, for some reason. Only probably 6% of the webhosts out there support GZIP compression so it doesn't benefit Google much to crawl asking for compressed pages.

At the "lowest" compression setting GZIP seems to reduce page size by a factor of 4, probably more if you embed repetitive stuff like fonts and styles.

If nothing else you sure are helping your website visitors tremendously, especially the most common users which are still 56K modem users!

Google may keep statistics on page size versus a searchers "return to Google" results time. Far fetched but maybe Google figured out a lot of users were returning to the search results before the page could have possibly loaded. Probably a crazy thought, more likely is the recent more intensive crawls.

Finally it seems like recently many, many, scraper sites results are now "Supplimental". Google may have put a more reasonable emphasis on large, content filled, pages like it always should have!