Forum Moderators: open

Message Too Old, No Replies

What's wrong with my random chac generator?

Producing too many duplicates

         

brickwall

3:31 am on Mar 21, 2005 (gmt 0)

10+ Year Member



I use the following VBScript function to generate random 8-character member codes for my new membership site:

-----------------------------------
Private Function GenMemberCode
Dim Count
For Count = 1 To 8
Randomize
GenMemberCode = GenMemberCode & Chr(Int((90 - 65 + 1) * Rnd + 65))
Next
End Function
------------------------------------

It works well at the start, but as my membership started to hit 100+, this function started producing more and more duplicate codes than otherwise can be attributed to mere chance. I get 5-10 members daily and 3-5 of their member codes are duplicates.

What's wrong with my function?

I don't want to check for duplicates everytime because of the additional processing db access overhead involve.

Can you show me a snippet of a more reliable 8-character randomizer.

Easy_Coder

5:52 pm on Mar 21, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Have you considered taking care of this at the database side? I had to do something very similar and decided that it would be easier to handle right inside my stored procedure prior to persisting any user values:


declare @m_IsUniqueID AS INT
declare @m_INT AS BIGINT
declare @m_CustomerID AS CHAR(8)
SET @m_IsUniqueID = 0 /* Flag Set To False */
WHILE @m_IsUniqueID = 0
BEGIN
SELECT @m_INT = ABS (CONVERT(BIGINT, CONVERT( BINARY(3), NEWID() )))
SET @m_CustomerID = CAST(@m_INT AS CHAR(8))
SET @m_CustomerID = REPLACE(space(8 - len(@m_INT)) + LTRIM(RTRIM(@m_INT)), SPACE(1), '0')
IF EXISTS(SELECT user_id FROM tablename WHERE user_id = RTRIM(LTRIM(@m_CustomerID)))
BEGIN
SET @m_IsUniqueID = 0
BREAK
END
ELSE
BEGIN
SET @m_IsUniqueID = 1
END
END

brickwall

7:59 am on Mar 22, 2005 (gmt 0)

10+ Year Member



The problem was easier to fix than I thought.

There really is something wrong with my function above.

Somehow the Randomize statement frequently produces the same seed value everytime it is called from within the procedure. When I removed the Randomize statement from the function and called it early (like after Option Explicit to produce a new seed everytime a page loads and not everytime the function is executed), the result was truly a random set of characters.

I can't explain why though. Maybe a VB guru here can.

brickwall

5:04 pm on Mar 22, 2005 (gmt 0)

10+ Year Member



thanks for your suggestion easy_coder.

I can't go via that route though because i need to do some processing on this random member code before i store them in the db. =)

Lord Majestic

5:12 pm on Mar 22, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Random, or more like pseudo-random number generators should not be counted upon generating unique values. This is very common mistake and its best to choose the right tool for what you do. Even perfect random number generator is not guaranteed from generating new unique value -- its not what it was designed for.

Try using good hash functions such as MD5 to generate unique randomly looking data that can be based on user's emailaddress + some, ahem, random junk appended to it.

Now talking specifically about your code -- Rnd will return next pseudo-random value from sequence that it was initialised by Randomize command. You should note however that if you have VB in a web form then your Randomize statement will be executed every time form was loaded and this might not produce required result.

More details here (hope this link is okay): [vbexplorer.com...]

brickwall

6:29 pm on Mar 22, 2005 (gmt 0)

10+ Year Member



thanks for that link m'lord. =)

This new site of mine is for a fairly small community. I don't expect to get more than 2k members here. So I thought a simple 8-character member code based on the organic VBScript Rnd function will be sufficient. I mean, considering the awesome odds. I believe any less simple approach would be an overkill.

What surprised me was the behavior of the Randomize function. Anywhere inside by routine (above) and it generates no less than 400 duplicates in a 1000 test outputs. Outside my routine (Randomize once each page load) and zero duplicates in a 1000 test outputs.

I know Randomize uses the system timer to generate seeds to be use by Rnd but I can't explain the results I got.

Lord Majestic

6:49 pm on Mar 22, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What surprised me was the behavior of the Randomize function.

Every programmer has to experience that at least once ;)

Anywhere inside by routine (above) and it generates no less than 400 duplicates in a 1000 test outputs. Outside my routine (Randomize once each page load) and zero duplicates in a 1000 test outputs.

The reason it was not working the way you expected when it was inside routine is simply because Randomize resets long line of pseudo-random numbers to a starting point. If you do it once outside of routine then Rnd will return next number, hence less dups, but when you used Randomize it had chance of reusing same sequence, hence starting from same numbers that produced duplicates.

In theory Randomize should use current time for sequence start, so in theory if time changes then generated numbers could be fairly different, but if you made lots of calls to your routine in a short period of time then chances were good it would reuse same sequence over and over again. If you called same routine in a loop then this is the explanation.

I suppose it something everyone has to go through, so long as you understand what happened you gained valuable experience -- never think that generation of a random (or pseudo-random) number is the same as generation of unique number.

I switched to using good hashes like MD5 - they produce pretty randomly looking identifiers and they are also unique and can be linked to user's email address.

myrandomstore

12:38 am on Apr 4, 2005 (gmt 0)



put randomize out side the loop and use Radomize(RND(-1)) somewhere and call randomize in a separate process.

you could also use a number between 12345678 and 99999999 and use a different conversion function, or etc. if you want a simple way to hash off the randomize function.

At <snip>, we use a simple date and order number routine.

[edited by: Xoc at 10:28 pm (utc) on April 4, 2005]
[edit reason] No urls, per Terms of Service [/edit]