MD5 encription using UTF-8?

Forum Moderators: coopster

Message Too Old, No Replies

MD5 encription using UTF-8?

its like swimming in a deep ocean

bono

9:03 am on Jul 5, 2005 (gmt 0)

Hi guys,

Im on with the final part of my ecommerce cart, the security, which until now ive been putting off..

My secure payment provider sends a md5 hash back to my server which i should be able to re-create and check against the only problem is I dont seem to be able to recreate the same hash..

Going through the user documentation i noticed the md5 string has to be created usuing UTF-8 encoding and not Unicode..

so will this be ok or do i have to convert somehow to UTF-8

$tomd5="mystring$itis";
$hash= md5($tomd5);

bono

11:32 am on Jul 6, 2005 (gmt 0)

does anybody know allot about MD5 that could help me.. seems that php md5 function is giving a different md5 hash to one created using javascript

coopster

6:17 pm on Jul 6, 2005 (gmt 0)

You can create a different md5 hash in your browser with javascript versus on your server with PHP? Can you do it consistently?

ergophobe

7:56 pm on Jul 6, 2005 (gmt 0)

UTF-8 is a Unicode encoding, but it won't be the same bits as a different UTF encoding. Because of the way MD5 hashes work, even one bit difference will make an entirely different hash.

As for the PHP/Javascript difference, I would wonder whether it is truly the same data or whether some conversion is happening somewhere along the line.

You may need to convert manually using the PHP multi_byte extensions (which I think are now available by default).

jatar_k

8:27 pm on Jul 6, 2005 (gmt 0)

>> whether some conversion is happening somewhere along the line

I was figuring the same, it smells like something is being interpreted poorly. Which is changing the initial value before the MD5 is created.

There should be no difference in the hash no matter what language is doing it (js, php or other) if they are using the same source data.

r_c_h

10:25 am on Jul 7, 2005 (gmt 0)

MD5 hash is produced from bytes and it isn't related to encodings or UTFs.

If it differs, it means source strings differ. Maybe wrong case. Or something is broken with your Javascript, it is likely with 8bit characters. As for popular md5.js file floating around on the internet, I would suggest url-encode the string before producing MD5 hash out of it.

bono

10:43 am on Jul 7, 2005 (gmt 0)

haha.. found the problem...

i was inputing a 567.00 value which was getting changed somewhere along the line to 567. was expecting 567.00 to be input into the md5 hence the differance

ergophobe

6:57 pm on Jul 7, 2005 (gmt 0)

Glad you got it figured bono.

MD5 hash is produced from bytes and it isn't related to encodings or UTFs.

The fact that MD5 is produced from bytes is precisely why it IS related to encodings. If I encode something with or without a BOM (byte-order mark), that will change one byte in the file - everything else can be the same, but the MD5 hash will be totally different because of that one byte.

Similarly, if I use big-endian Unicode in one source and that gets translated to small-endian, it's not that much difference to, say, a compression utility which would achieve almost exactly the same file size in both cases, but for an MD5 hash, that is completely different.

Encoding makes all the difference in the world in terms of how the MD5 hash will work.

r_c_h

10:22 am on Jul 8, 2005 (gmt 0)

The point was that you don't need to do any conversions to get the same hash from the same string - if it is really the same string.

bono

11:09 am on Jul 8, 2005 (gmt 0)

just make sure it is the same string.. and is the string is being made elsewhere.. ie your secure card payment provide make sure that they are creating the string the way they say they are creating the string.. and only take an answer from the horses mouth ( the guy who developed the code in the first place..)

ergophobe

4:03 pm on Jul 8, 2005 (gmt 0)

The point was that you don't need to do any conversions to get the same hash from the same string

Probably shouldn't go around on this again, since I think we're saying the same thing in different ways. Anyway, my original point was that the "same string" (same characters) will have a completely different set of bits if the encoding changes, and in the original post, he mentioned that there might be some encoding conversion going on. If that were the case, the hash would be completely different if it were done first on one encoding and then again on the "same string" but after conversion to another encoding.

That is why I said that the encoding does matter. That is, if it changes, it matters. If it doesn't change, then of course the MD5 algo doesn't care what encoding you use, which I think is the point that you were making.