I'm finding it easy to untaint data one element at a time like this
($untainted) = $query->param('formdata') =~ /^([\w.<>\/,"=]+)$/;
But I just can't seem to successfuly apply this to a foreach loop and get the untainted variables out after. I really need some help on this one and woule be very grateful if a local expert were to jump in and explain how to do this. I've just started to use taint mode (I know I should always use it).
This is what I'm trying
foreach ($query->param()) {
if ($query->param($_) =~ /^([\w.<>\/,"=]+)$/) {
$query->param(-name=>$_,-value=>$1);
}
}
This is the output I get when I print $query->param('footnote')
CGI=HASH(0x801180)->param('footnote')
Where am I going wrong?
Many thanks for any help.
Best wishes
Sid
PS The regex allows simple HTML into the $untainted scalar. I plan to verify some of the $query->param's further down the script but first pass I want to exclude dangerous characters.
The best approach is to examine each variable and run it through a series of checks to be sure the user of your form doesn't try to pass something illegal into your form that will make it crash or cause a security breach.
So you can get dangerous characters through that regex can you?
The point is that the regex that I'm using does exactly what I want it to do ie make sure dangerous characters are not passed then for fields that need more checking I also put them through more.
My post was on the specifics of what was wrong with my code rather than requesting a lecture on taint mode.
Thanks anyway.
Sid
What I'm trying to say is... if your form field is asking for a "yes" or a "no" input from the user (for example) then maybe your regex is only validating the input to be sure it's alphanumeric and doesn't contain any illegal characters. However, if one of your malicious users decides to pass a "maybe" through the form, the input they sent would still be valid as far as the regex is concerned because it only contains alphanumerics, but they have successfully tricked the system into doing something it shouldn't.
My philosophy is to make sure your script is checking the variable to be sure it ONLY equals "yes" or "no" and anything else will return an error message. Of course I realize this means you are now having to check each and every variable to be sure it contains only what you expect the user to input, but that's the only way you're really going to be secure with this script.
If you're only checking the input to be sure there's no "ugly" characters in the field, that scheme allows people to put whatever they want into the fields and the test it needs to pass is very simple. This is what I call a "mostly open" approach. It's better to construct a regex that will only allow the user to input what you expect them to input and nothing else, which I call a "mostly closed" approach.
This is better from a security standpoint as well as serving your needs. For example, if the field we're talkign about is for a person's home phone number, you should construct your regex to look for a pattern that seems to be a phone number so that they don't try to enter their home address into the box by mistake. So the purpose is two-fold. It prevents them from entering some bogus characters that will violate your security AND it ensures that they will enter what they are supposed to.
Forgive me if you are already checking for these kinds of things elsewhere in your script, but from the sound of your post, it seems like you are only trying to untaint the data and nothing else so that it successfully passes the -T switch.