Welcome to WebmasterWorld Guest from 35.172.195.49

Forum Moderators: coopster & jatar k

Nice one I hadn't seen before: gzencode()

     
7:00 am on Aug 20, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Mar 15, 2013
posts: 1205
votes: 120


This might be an oldie to some of you, but I hadn't seen it before so I thought I'd share :-)

One issue I have is that data in MySQL is taking up a HUGE amount of space, and I'm about to have to upgrade my entire dedicated server just to get more storage! But then I came across this:

$string = "example";

// compress the string
$compressed = gzencode($string);

// decompress the compressed string
$original = gzdecode($compressed);


I took a string that was 1390 characters, and compressed it to 723 characters!

It actually does a poor job with short strings like in the example, but if you have a message board or private messages where the text can get quite long, that's a different story! From my basic testing, you'll save storage on anything over 58 characters.

Only downside that I can find, when it's compressed you won't be able to use MySQL to do a simple search in the field :-( I tried using gzencode() to compresses the search term and then used strpos() to see if the compressed string contained the compressed search term, but it wasn't found. So it's not a great solution for me to use with a message board (unless someone can suggest something better), but it would definitely help with Private Messages.
9:05 am on Aug 20, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Nov 13, 2016
posts: 1194
votes: 288


You can also have a look at the bz... functions : [php.net...] , which, often produce "slightly" smaller data (but not big difference).

I tried using gzencode() to compresses the search term and then used strpos() to see if the compressed string contained the compressed search term, but it wasn't found.

This is not how it works :)

[en.wikipedia.org...]

Basically, this kind of compression algorithm is working by checking the frequency of repetition of a sequence of bytes (in this case words / characters).

An extremely simplistic view would be :
- the first occurrence of the word "widget" is stored as it
- the next time the word "widget" is found, instead of storing the word, the algorithm will store a pointer to the previous reference (as well as the length of the sequence).

As I said, this is simplistic explanation, in real life this is a more complex , and GZ involves also Huffman tree to adapt the length of coding.

So, you can't use the MySQL indexing function, to run search, you'll need to create your own search engine, with a separate "dictionary".
9:33 am on Aug 20, 2019 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member brotherhood_of_lan is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 30, 2002
posts:5046
votes: 60


brotli is another compression algorithm worth a look
10:11 am on Aug 20, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Nov 13, 2016
posts: 1194
votes: 288


brotli is another compression algorithm worth a look

There is no native support for brotli in PHP (yet). If you want to use Brotli within PHP script, you need to compile yourself this extension : [github.com...] . One day, it will certainly make its way to the official distribution of PHP.

Indirectly related, I use Brotli to compress my pages. I use the H2O web server and obtain faster compression, with slightly smaller files than GZ. Brotli extensions are also available for Nginx, Apache, etc...
4:36 pm on Oct 28, 2019 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Aug 30, 2019
posts:147
votes: 30


Hello-

The problem with using compressed string / text, is that, if one day, the string / text is corrupted, you will no longer be able to retrieve the text at all. Using plain text is more convenient, if the text is corrupted, you'll get awkward characters in the middle, but part of the text will still be readable.