Forum Moderators: coopster
This is indirectly related to PHP.
I would like generating a rand ID using rand() and a combo of letters and number
To keep it simple I wish to only use 6 chars
So the question is:
What is the probability that out of 100k such rand IDs a duplicate could be generated?
I could indeed match it VS previously DB stored but if the frequency of possible duplication is somehow great then it becomes detrimental and I need finding a way with less duplication potentiality
Hope it's clear :)
if you're using letters a-z and 0-9 then th possible permutations is much higher than 100K
The odds are slim, but still possible. Why not use some type of auto-increment?
1,402,410,240 I believe
n!/(n-m)!
where
n=36 (possible chars per position - a-z+0-9)
m=6 (number of chars in set)
;)
100k may be ok, that is quite a small set for that number of possibilities
if you wanted to reduce the probability increase the number of chars to 8 or 10
8 would be 1.22 x 10 to the power of 12
10 would be 9.22 x 10 to power of 14
either increases the number of possibilities by quite a bit
What he said!...umm...yea! ;)
AB
BA
AC
CA
BC
CB
But I don't think that AA, BB or CC are valid. If this is the case, then what you are saying is that there are however many (1.whatever billion) permutations without repeating a character. Therefore, I believe the method I originally used
36*36*36*36*36*36 = 2,176,782,336 to be correct.
But then again, I could be wrong :)
Sorry for taking the thread off topic
I did get C's in my stats class; you may want to consult an expert ;)
Unless I am wrong it will result in 5 matches
Well, it doesn't have to as it's based on probability, but it is probably the best idea to check for duplicates anyway.
my honest opinion is that random ids are never the best
you can add logic to them as well
letters from first and/or last name
date of birth
append a short random set of digits
any other info you have about the member can be truncated and used to build something that will be truly unique
especially if you can't use an auto increment it makes me think you will use the actual number somehow where the user can see it
makes it easier for people to remember if there is logic around it
32 characters is a lot if you also expect people to retype the key. If you want fewer characters, just use a substring of the md5 hash. Here's how many unique public keys you'll get before experiencing a collision based on the length of the substring (assuming you start at position 0, starting at different positions will yield different results):
8 characters: 82,944
9 characters: 400,703
10 characters: 441,590
11 characters: 6,283,303
I didn't go any higher than 11 characters, but I would assume that even just a 14-16 character substring would yield a suitably high number of uniques for many applications.
I generated these numbers by populating a table with md5 hashes of the serial numbers with different length UNIQUE keys set on the md5 column which I assume to be identical to inserting a substring(md5($id),0,11) into a VARCHAR(11) field.
but even with 8 your number is lower?
I am just trying to understand the math because your idea sounds real easy and making sense.
Could you define the math behind your number?
thanks
Instead of the possibility of collision points (although small) for every new record, the idea I tossed out there gives you clearly defined collision points so you won't have to iterate through random attempts until you find an unused primary key.
I worry about comparing the uniqueness of random keys [webmasterworld.com] because of what ergophobe pointed out to me a couple years back:
once you hit say 1.45 quindecillion records, it could take days to generate a unique key couldn't it?
Even concatenated keys make me worry because I have seen concatenated keys like "first two characters of the first name, last character of the last name, middle three digits of the phone number, and a random 4 digit number" end up with a shocking number of collisions over just a hundred thousand records.
I do like adding logic to my keys when necessary but the numbers used in them are always sequential in some way
whenever I contemplate randomness I always think of rand functions when I started programming. If you didn't manually seed them they always came up with the same sequence of numbers.