Welcome to WebmasterWorld Guest from

Forum Moderators: coopster & jatar k

Base64 encode raw binary

Is it more efficient or am I losing data?

12:28 am on Jan 4, 2014 (gmt 0)

5+ Year Member Top Contributors Of The Month

Internally my site has a static id number associated with each account. I need to start using the ids publicly, but I don't want people to be able to discern the total number of user accounts on my site.

I plan on generating and storing a non-sequential, unique, and short public id for each user in my database by using a salted md5 hash of the user id. This approach gives me non-sequential and unique (theoretical collisions withstanding) ids, however, they're longer than what I'd like.

I'd like to shorten by converting them from base 16 to base 64.

The ids are too large for base_convert(), and that only goes up to base 36 anyways.

Instead I am using base64_encode().

I notice that I get considerable shorter ids if I use the raw binary md5 output. I'm hoping somebody can tell me if this is because they can be encoded more efficiently or if its because I'm losing data (and making collisions more probable).

Short: base64_encode(md5($id.$salt, TRUE));
Long: base64_encode(md5($id.$salt));

6:08 pm on Jan 10, 2014 (gmt 0)

WebmasterWorld Senior Member drdoc is a WebmasterWorld Top Contributor of All Time 10+ Year Member

I honestly don't know how the inner workings of base64_encode operate on binary data.
3:42 pm on Jan 11, 2014 (gmt 0)

5+ Year Member Top Contributors Of The Month

Yeah, I wasn't able to find very much information on it either. Right now I'm going with it crossing my fingers it works. I could make some tests if I knew how to convert text into raw binary and vice-versa, but I don't know of any functions to do that except this option in md5. Honestly, I don't even know what "raw binary" is and how it's different than binary.
9:33 pm on Jan 11, 2014 (gmt 0)

WebmasterWorld Senior Member swa66 is a WebmasterWorld Top Contributor of All Time 10+ Year Member

base64_encode encodes 8bit data into a printable format.

If you feed it a printable md5 hash you get 32 positions that each hold 4 bits of information (0-f). But base64 doesn't know that so it'll assume 8bit data for 32 positions to be encoded.

If you get the raw md5 you get 128bits of data (or 16 bytes of 8 bit data) - unprintable in all likelihood. If you feed that to base64, it'll only have half the amount to encode, so the result will be proportionally shorter.

Nothing odd IOW.

Featured Threads

Hot Threads This Week

Hot Threads This Month