homepage Welcome to WebmasterWorld Guest from 54.196.62.23
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

    
Google Spider Suggestion
Googlebot + Compression
psoares



 
Msg#: 7795 posted 3:46 pm on Dec 18, 2002 (gmt 0)

For a couple of years now we moved all our sites to dedicated servers, which by the way was a great move for us, since then we've been able to customize our content in many new ways and we're not going back to individual hosts.

Well, the only drawback is we now spend on data center management and also pay by the gigabyte of transfer - our hosting company offers 1st class service but their bandwidth is very expensive.

Guess who's one of our biggest consumers of bandwidth? Googlebot. Just this month Googlebot costed us over U$ 600.00 in bandwidth.

I would appreciate it if Googlebot 2.1 was improved to 2.2 by accepting gzip compression.

It'll add a little overhead to the spider to decompress the retrieved content before archiving it - but will save the world a few billion dollars in bandwidth every year.

This year our average is U$ 500 in Googlebot bandwidth monthly, so we'll have paid about U$ 6,000.00 to our bandwidth provider for Google to send us "free" traffic in 2002.

Would Google please consider accepting gzipped content through Googlebot? Thanks for your time.

 

jimdanforth

10+ Year Member



 
Msg#: 7795 posted 3:54 pm on Dec 18, 2002 (gmt 0)

Either you have a really huge content, or your bandwidth is rediculous.

Loki_Alex

10+ Year Member



 
Msg#: 7795 posted 7:29 pm on Dec 18, 2002 (gmt 0)

Wow -- what kind of bandwidth is google pulling out of your site(s)? You must at least get tons of Google traffic if you have so many pages that Googlebot costs you $500.00 a month. It's a problem I wish I had. :) Of course, you can solve this expense very simply by the use of your robots.txt and keep the little bot away from your pages.

MeditationMan

10+ Year Member



 
Msg#: 7795 posted 7:43 pm on Dec 18, 2002 (gmt 0)

Googleguy was also talking recently about setting up your server so that Googlebot only came looking for new pages. That should cut down your bill, unless you update every page every month. Can't remember the technical term for this, nor can I find the thread, but perhaps someone else remembers.

Giacomo

10+ Year Member



 
Msg#: 7795 posted 7:49 pm on Dec 18, 2002 (gmt 0)

Here it is, MeditationMan:

[webmasterworld.com...]

psoares



 
Msg#: 7795 posted 8:17 pm on Dec 18, 2002 (gmt 0)

Thanks for the replies ;)

Actually they're all my sites that I run since 1996, we rented a server farm and moved the sites to several servers. There's plenty of work there, and over (way way way over) a million pages.

We do get a heap of google traffic as our subjects range from needles to satellites (both fictitious, we don't do needles OR satellites). MOST our pages don't sell a thing. No I'm not Slashdot.

My suggestion is not just for my own good, it's for the overall well being of the web. Our revenues from selling our widgets pay for the bandwidth (there's even a little left for friday beer ;))

But after I installed mod_gzip on my server farm (most browsers support gzip) the page size went from 20k average to 2k (amazing huh!) - but googlebot transfers remain in the 20k average and I've tcpdumped Googlebot requests - they indeed do not do Accept-Encoding: gzip.

If Googlebot used gzip it would cut Googlebot's bandwidth spending on my site from U$ 500.00 to U$ 50.00 monthly! From 6,000.00 yearly to U$ 600.00 yearly, just from adding GZIP support - painless, easy, quick!

Imagine what this would do for Google themselves, they pay for bandwidth too!

How could Google benefit from compression?

- Google would cut bandwidth costs
- If enough servers installed compression they'd spider the web much faster (if 100% of worldwide apache users installed mod_gzip I'd guess google could spider the web about 7 times faster, no kidding!)
- Compression is already used in images and video, why not on text? In case you're not a file format expert JPEG, MPEG, GIF, MP3, PNG and friends are just names of compression standards combined with a file format!
- Compression is standard and always accepted in computer science, it causes no information loss and gzip is very fast on Linux, which Google uses on cheap hardware
- Hardware is cheaper than bandwidth

How would Google stimulate enough webmasters to install gzip?

Instead of just terrorizing SEO's with their webmasters pages Google should add something like :

WEBMASTERS IF YOU CONFIGURE YOUR SERVER MAKE SURE TO ENABLE GZIP COMPRESSION ON IT - YOU'LL GET CRAWLED FASTER AND SAVE TONS OF DOLLARS AND MAKE THE WORLD A BETTER PLACE

Other engines would follow suit, the internet would be faster (about 10 times faster). Webmasters that didn't have gzip installed would install it just to have their pages load faster - believe me mod_gzip made my sites load about 5x times faster on AVERAGE - sometimes even reaching 10 times speed increase.

Microsoft showed extremely good attitude adding gzip support to all internet explorer software - therefore the 99% of my visitors that use internet explorer and mozilla already see my pages faster and they load about 7 times faster WITHOUT CHANGING A THING ON MY CONTENT.

Googleguy, as if you didn't get enough free consulting on this great forum here's a post I'd really appreciate if you read. Help make the Internet a better place and make Googlebot gzip-compliant, in C it's one library you gotta link to, in Python it is a module you load, in Perl it is a 1 minute change to your code.....don't know what Googlebot is written in (I'd bet C/C++ based on Altavista's Scooter) but it shouldn't take more than 1 day to change this!

GoogleGuy

WebmasterWorld Senior Member googleguy us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 7795 posted 8:48 pm on Dec 18, 2002 (gmt 0)

That's a good suggestion, psoares. I think one of our crawls does support that; I'll ask about how hard it would be to support the ability to accept gzip more as we crawl.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved