Internally my site has a static id number associated with each account. I need to start using the ids publicly, but I don't want people to be able to discern the total number of user accounts on my site.
I plan on generating and storing a non-sequential, unique, and short public id for each user in my database by using a salted md5 hash of the user id. This approach gives me non-sequential and unique (theoretical collisions withstanding) ids, however, they're longer than what I'd like.
I'd like to shorten by converting them from base 16 to base 64.
The ids are too large for base_convert(), and that only goes up to base 36 anyways.
Instead I am using base64_encode().
I notice that I get considerable shorter ids if I use the raw binary md5 output. I'm hoping somebody can tell me if this is because they can be encoded more efficiently or if its because I'm losing data (and making collisions more probable).
Yeah, I wasn't able to find very much information on it either. Right now I'm going with it crossing my fingers it works. I could make some tests if I knew how to convert text into raw binary and vice-versa, but I don't know of any functions to do that except this option in md5. Honestly, I don't even know what "raw binary" is and how it's different than binary.
base64_encode encodes 8bit data into a printable format.
If you feed it a printable md5 hash you get 32 positions that each hold 4 bits of information (0-f). But base64 doesn't know that so it'll assume 8bit data for 32 positions to be encoded.
If you get the raw md5 you get 128bits of data (or 16 bytes of 8 bit data) - unprintable in all likelihood. If you feed that to base64, it'll only have half the amount to encode, so the result will be proportionally shorter.