Forum Moderators: coopster

Message Too Old, No Replies

How to create string unique ID's? (youtube, tinyurl, etc.)

hash, youtube, unique, ID, database

         

outlaw

5:58 pm on Aug 9, 2007 (gmt 0)

10+ Year Member



Hello, I've always wondered how (and why) does Youtube use unique video IDs of the form: [youtube.com...]

How are these generated such that there are no duplicates, and what advantage does this have over having a simple auto icrementing numeric ID?

Tinyurl does the same thing when transforming urls:
[google.com...] <-- becomes --> [tinyurl.com...]

Any clue? I'm guessing this is some sort of hash? Is there anyway to guarantee a 1-to-1 mapping? Any help would be greatful?

WesleyC

6:32 pm on Aug 9, 2007 (gmt 0)

10+ Year Member



Tinyurl, from what I can tell, generates a short string based on a list of allowed characters . It DOES use an autoincrement index of sorts--it just allows numbers and letters (and other characters...?) for a grand total of 37+ unique characters instead of 10. Also, I believe it does a database lookup to see if a string has already been generated for a given URL--minimizing the number it has to generate.

YouTube is a different bear. I'll have to check when I get home (work blocks YT for obvious reasons) but it appears to generate an ID, then encode it in base 64.

If you want a truly unique ID that's not an autoincrement, look into the uniqid [php.net] PHP function.

You MIGHT run across a duplicate now and again generated by this function, but the odds are astronomical.

The advantage of this type of ID is that you can't see what records are next to each other in the database. For instance, if I'm a malicious user looking at a (poorly-implemented, I might add) webpage--say, Gmail--and I see my user ID number is 5, I can be fairly sure there are going to be users at IDs 1, 2, 3, and 4. If my ID is stored as an unchanged number in the URL or in a hidden form field somewhere, it's a simple matter to modify that number and attempt to gain access as another user. Since the first user (or one of the first users) many web applications have is the administrator, I can, with a bit of guesswork, try to gain access to the administrator account by changing my ID number.

Unique IDs, on the other hand, have the advantage of being non-continuous. That is, a unique ID generated for user 1 is almost guaranteed to be completely different from user 2's unique ID.

outlaw

8:23 pm on Aug 9, 2007 (gmt 0)

10+ Year Member



Well maybe tinyurl was a bad example. Actually thinking about it some more, they probably don't need any database work, just a "short-name" encoding and decoding function (still allows for the same url to give the same tinyurl each time)... but that still raises the question how does one go about making a short name encoding algorithm?

Thanks, the advantage makes perfect sense.
I guess my question is regarding any site that doesn't use auto_incrementing values. Another good example with youtube is tinypic.com. They aren't just creating a unique id but also seems to compress it to 5-10 chars while maintaining uniqueness.
PHP's uniqid creates 13. Can base64 do this?

Base64-encoded data takes about 33% more space than the original data.