Forum Moderators: coopster
Im on with the final part of my ecommerce cart, the security, which until now ive been putting off..
My secure payment provider sends a md5 hash back to my server which i should be able to re-create and check against the only problem is I dont seem to be able to recreate the same hash..
Going through the user documentation i noticed the md5 string has to be created usuing UTF-8 encoding and not Unicode..
so will this be ok or do i have to convert somehow to UTF-8
$tomd5="mystring$itis";
$hash= md5($tomd5);
As for the PHP/Javascript difference, I would wonder whether it is truly the same data or whether some conversion is happening somewhere along the line.
You may need to convert manually using the PHP multi_byte extensions (which I think are now available by default).
I was figuring the same, it smells like something is being interpreted poorly. Which is changing the initial value before the MD5 is created.
There should be no difference in the hash no matter what language is doing it (js, php or other) if they are using the same source data.
If it differs, it means source strings differ. Maybe wrong case. Or something is broken with your Javascript, it is likely with 8bit characters. As for popular md5.js file floating around on the internet, I would suggest url-encode the string before producing MD5 hash out of it.
MD5 hash is produced from bytes and it isn't related to encodings or UTFs.
The fact that MD5 is produced from bytes is precisely why it IS related to encodings. If I encode something with or without a BOM (byte-order mark), that will change one byte in the file - everything else can be the same, but the MD5 hash will be totally different because of that one byte.
Similarly, if I use big-endian Unicode in one source and that gets translated to small-endian, it's not that much difference to, say, a compression utility which would achieve almost exactly the same file size in both cases, but for an MD5 hash, that is completely different.
Encoding makes all the difference in the world in terms of how the MD5 hash will work.
The point was that you don't need to do any conversions to get the same hash from the same string
Probably shouldn't go around on this again, since I think we're saying the same thing in different ways. Anyway, my original point was that the "same string" (same characters) will have a completely different set of bits if the encoding changes, and in the original post, he mentioned that there might be some encoding conversion going on. If that were the case, the hash would be completely different if it were done first on one encoding and then again on the "same string" but after conversion to another encoding.
That is why I said that the encoding does matter. That is, if it changes, it matters. If it doesn't change, then of course the MD5 algo doesn't care what encoding you use, which I think is the point that you were making.