Forum Moderators: coopster

Message Too Old, No Replies

Could use help validating form input with PHP

         

craighaggart

3:16 am on Nov 24, 2009 (gmt 0)

10+ Year Member



Please be gentle -- I'm still a newbie! I created a small web site for our volunteer organization, and included a way for people to contact us via HTML form. I wrote a simple PHP script to assemble the name, e-mail address, and text comments into an e-mail message, strip out any HTML, and send it to our contact person. However, she recently told me that she has received a couple of "garbled messages." Her description leads me to think that it was perhaps some sort of spam attempt using Java or something like that.

I have done a bunch of web searches and a bunch of reading about security using PHP, but it's all pretty confusing to a newbie like me. Is there a better, straightforward way to avoid being spammed than using strip_tags? There is no need to allow any HTML or Java in our form, since it's only supposed to be a name, an e-mail address, and text comments (numbers must be allowed in case someone includes a phone number).

Included below are the relevant portion of my XHTML code (with CSS selectors and formatting tags removed for brevity) and my little PHP script. I replaced our actual domain info with "ourdomain.org" in case the stupid spammers scour sites like this, too.

-Craig

-------- contactpage.html --------

<form action="contactscript.php" method="POST">
<table>
<tr>
<td>Name: </td>
<td> <input type=text name="name" size=30 /></td>
</tr>
<tr>
<td>Email:</td>
<td> <input type=text name="email" size=30 /></td>
</tr>
</table>
<textarea name="comments" cols=35 rows=5></textarea>
<input type=submit value="Send Message" />
</form>

-------- contactscript.php --------

<?php
$to = "contact@example.com";
$subject = "Submission from web site 'Contact Us' page";
$name = $_POST['name'];
$email = $_POST['email'];
$comments = $_POST['comments'];
$strippedname = strip_tags($name);
$strippedemail = strip_tags($email);
$strippedcomments = strip_tags($comments);
$message = "This message was sent automatically from the web site's \n" .
"'Contact Us' page. The sender provided the following info: \n" .
"\n" .
"Name: $strippedname\n" .
"E-mail address: $strippedemail\n" .
"\n" .
"Comments: $strippedcomments\n" .
"\n";
header( "Location: http://example.com/contactsubmitted.html" );
mail($to,$subject,$message);
?>

[edited by: dreamcatcher at 7:41 am (utc) on Nov. 24, 2009]
[edit reason] use example.com. Thanks. [/edit]

TheMadScientist

5:15 am on Nov 24, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There are is a security addition I would make that will not help with spam, but will make things more secure, which is addslashes().

The 'spam decreasing' measure I would take is:

Add another field to your form called 'verify' or something like that where you have the person verify they are really a person, then output an error if they miss the question and not submit the form.

The 'verify' word(s) can be anything a person can read or see on the page, but a bot can't.

I usually use something in the logo...

On one form it's, 'please type the word in the orange circle in the main logo.' On another it's 'what is the picture at the top of the page of?'

It's always something super easy for a person to comprehend and do, but a bot can't.

You can put a graphic on the page and have people read it, like captcha but there's usually one already on the page you can use easily enough. I don't recommend using the domain name, but you could use a portion of it, like 'what't the first word in our domain name?'

Just change 'awordinyourlogohere' in the following to whatever you have people type. (Make sure what you type is all lower case.)

strtolower() makes it so they don't have to worry about capitalization. They can type in all caps or mixed cases or all lowercase and it will pass the check as long as they get the word (or phrase) right.

Also, it's usually easier to output an entry error if you submit the form to itself and have the php at the top of the page, so it would look like this:

<?php
if(isset($_POST['FormSubmitted'])
&& $_POST['FormSubmitted']==='1'
&& isset($_POST['verify'])
&& !empty($_POST['verify'])
&& strtolower($_POST['verify'])==='awordinyourlogohere') {

$to = "contact@ourdomain.org";
$subject = "Submission from web site 'Contact Us' page";
$name = addslashes($_POST['name']);
$email = addslashes($_POST['email']);
$comments = addslashes($_POST['comments']);
$strippedname = strip_tags($name);
$strippedemail = strip_tags($email);
$strippedcomments = strip_tags($comments);
$message = "This message was sent automatically from the web site's \n" .
"'Contact Us' page. The sender provided the following info: \n" .
"\n" .
"Name: $strippedname\n" .
"E-mail address: $strippedemail\n" .
"\n" .
"Comments: $strippedcomments\n" .
"\n";
mail($to,$subject,$message);

$thank_you = 'Thank you for contacting us. Your message has been sent and we will get back to you as soon as we can.';

/* Or Include this and exit on a successful submission:
include_once $_SERVER['DOCUMENT_ROOT'] . '/contactsubmitted.html';
exit();
*/
}

elseif(isset($_POST['FormSubmitted']) && $_POST['FormSubmitted']==='1') {
$error='Your submission was not sent. Please verify you are a real person by typing the word in our logo in the verify box. Thanks!';
}
?>

<form action="<?php echo $PHP_SELF; ?>" method="POST">
<?php echo $thank_you.$error; ?>
<input type="hidden" name="FormSubmitted" value="1">

<table>
<tr>
<td>Name: </td>
<td> <input type=text name="name" size=30 /></td>
</tr>
<tr>
<td>Email:</td>
<td> <input type=text name="email" size=30 /></td>
</tr>
</table>
<textarea name="comments" cols=35 rows=5></textarea>
<input type=submit value="Send Message" />
<tr>
<td>Verify:</td>
<td> <input type=text name="verify" size=30 /></td>
Please verify you are a real person and not a spam bot by typing the name you see in our logo in the text box above.
</tr>

</form>

I'm coding as I go, but hopefully it'll be something you understand and can incorporate fairly easily, even if there's an error or two in it.

craighaggart

6:33 am on Nov 24, 2009 (gmt 0)

10+ Year Member



Wow, thank you for the great reply! I will incorporate your suggestions. There is indeed a totally simple question that any human visiting our site could answer correctly, so that should keep out the spambots.

The addslashes function is a new one to me, as I still haven't done any interaction with a database. But I will in the near future, so it seems like a good function to be familiar with.

I'm a bit confused by the part about submitting the form to itself, but I'll try to figure that one out.

By the way, I have to ask about method="POST". That's how I had coded my page, but I just ran it through a new validator I got (it's a Firefox extension) and it told me that capitalizing the word POST is an error (my doctype is XHTML 1.0 transitional). I changed it to "post" and re-uploaded the page; it validates and still seems to work, so I guess at least it isn't wrong. I know that isn't a PHP issue, but... is it just standard practice for people to capitalize "POST" and "GET"? They are in caps in every reference I have seen.

-Craig

craighaggart

6:38 am on Nov 24, 2009 (gmt 0)

10+ Year Member



Geez, what's up with my computer?! It sent my reply THREE times! I don't see any way to delete a message, either. Sorry about that.

-Craig

dreamcatcher

7:46 am on Nov 24, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No worries Craig, I deleted your dupes.

Just a note about TheMadScientist`s excellent information. addslashes is now deprecated and should not be used. You also don`t need to use it in this instance as you are sending data via e-mail. Escaping quotes with addslashes would only be applicable if you are posting to a database and then you would use mysql_real_escape_string.

dc

TheMadScientist

1:29 pm on Nov 24, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



addslashes is now deprecated and should not be used. You also don`t need to use it in this instance as you are sending data via e-mail. Escaping quotes with addslashes would only be applicable if you are posting to a database and then you would use mysql_real_escape_string.

Thanks for the heads up... I haven't read through all the new info yet, but looks like I'm going to have to. I knew magic_quotes was finally removed, but didn't realize addslashes was too.

I've actually switched to htmlentities() with ENT_QUOTES lately, because it suits my needs and then a page is 'proper html' if I redisplay the information from a db or send it in an HTML e-mail or whatever, which is usually what I do with it... I think slashing is just a habit recommendation for some reason.

I'm a bit confused by the part about submitting the form to itself, but I'll try to figure that one out.

Basically, instead of POSTing the script to contactscript.php you would copy and paste the code from contactscript.php to the top of your form page (above any html), then POST the for to itself.

This way when there is a submission it's easy to output a 'failed submission', tell the person why it failed and let them correct it right there without having to click back to try again.

You might search this forum for re-populating form fields after a submission, because there's quite a few examples of how to do it...

repopulate form fields after a submission site:webmasterworld.com/php/

This is really a visitor convenience, because it keeps them on the form page after they submit and if there are any errors the original of what they entered is presented. This way you can also easily add checks for valid information, especially e-mail addresses, and websites (etc.) when applicable.

By the way, I have to ask about method="POST". That's how I had coded my page, but I just ran it through a new validator I got (it's a Firefox extension) and it told me that capitalizing the word POST is an error (my doctype is XHTML 1.0 transitional). I changed it to "post" and re-uploaded the page; it validates and still seems to work, so I guess at least it isn't wrong. I know that isn't a PHP issue, but... is it just standard practice for people to capitalize "POST" and "GET"? They are in caps in every reference I have seen.

Eh... I use an XHTML doc type on a couple of sites and won't ever make that mistake again, but it's too much of a PITA (Pain in the guess) to go back and change everything to HTML 4.01 transitional or HTML 5, so they are what they are with POST capitalized. Personally, if the browser can figure it out I don't worry too much about what seem to be silly semantics.

It's the same with tags... some people capitalize <A></A> some don't as long as it renders, displays and takes the correct action in a browser it doesn't matter too much to me, even though I do usually make sure my sites validate.

(Yeah, the preceding is an absolute oxymoron, but the point is there's really no difference in POST and post except semantics and when you're posting somewhere like here or helping people understand how to do something it's much easier to understand => POSTing POSTed POST mean a form and => posting posted post is what I'm doing, did, do, so I usually always capitalize (even on sites) accordingly, because they're like two different spellings to me.)

The short version is: They're the same thing. If lowercase is correct for the doc type, then go with lowercase, but I would guess we're a long time a way from the day when browsers quit submitting a form because of case.

rocknbil

7:50 pm on Nov 24, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



she has received a couple of "garbled messages."

This is "tasting" the form, seeing what form fields it uses, how it can be compromised. It is usually automated. The basic pattern is you get these and it seems to stop, so you think it's not and issue, which is the intended response, complacency. Then a few weeks later you start getting hammered.

The method T.M.S. describes is a challenge/response approach, similar to a CAPTCHA. There are several approaches to avoiding spam submissions, but I put these as the absolute last resort as opposed to other methods.

Why?

The prime objective, like it or not, is making your content accessible to the users. It's not to sell or collect leads, these are side effects that come out of doing the previous effectively. Every time you add another "requirement" for an end user, it subtracts from this. you have to remember, the "average user" we know and love has a very short attention span on the Internet, and are easily dissuaded from hitting that submit button. You will learn this when you delve into an ecommerce sites.

Second, even CAPTCHA systems can be circumnavigated, I've seen it happen. "hidden form fields", changing field names, Javascript, and other "on page" changes will only provide temporary relief.

Rather than making "spam avoidance" the end user's job, make it yours. It's really easier than it sounds.

A good deal of "spam avoidance" is addressed by filtering - looking for bad stuff and deleting it. The exact opposite is the approach you should take - accept only what you want and throw anything else away (or error out with a message.)

An example (which does not apply to all,)

$in = preg_replace('/[^A-Z0-9]+/i','',$in);

Will strip anything except letters and numbers (which is not that useful for all situations, but a place to start.) To error, do a preg_match and if found, kill the program.

For each input from post/get, accept only expected values, throw away anything you don't want.

The second thing that no one seems to do is to log all raw input from your forms. This will give you a different view of what's being submitted than access logs ever will. It will reveal who is attacking you, what they are trying to do, which leads to how you stop them. It's really easy. Open a file in a private location (so it can't be browsed,) dump the date/time, IP address, and all raw input. If malicious data is found, log the field used and the malicious data content (makes it easier to identify.)

I'm telling you, this is gold.

Last in line - let's say you allow only basic punctuation and letters/numbers for all input.

$in = preg_replace('/[^A-Z0-9\-\;\,\.\@\(\)]+/i','',$in);

Now you filter your data based on patterns you find in the raw data logging. But you can't know what that is if you don't log it. Some examples are "bbCode" and regular link dropping. This is not a full list, but to give you an idea:


$bad_patterns = Array (
'b*cc\s*:',
'to\s*:', // You already have a to in email headers
'content\-type', // multipart email attacks
'\[\s*URL.*\]*',
'\[\s*LINK.*\]*',
'\%5B\s*URL.*(\%5D)*',
'\%5B\s*LINK.*(\%5D)*',
'\[\s*a\s*href.*\]*',
'\%5B\s*a\s*href.*(\%5B)*',
'\<\s*a\s*href.*\>*',
'\%3C\s*a\s*href.*(\%3E)*'
);
$spam_in=0;
foreach ($_POST as $key => $value) {
foreach ($bad_patterns as $v) {
if (preg_match("/$v/i",$_POST[$key])) {
$spam_in = 1;
$input_content .= "$key: $value\n";
}
}
if ($spam_in==1) { some_exit_function(); }

The previous stops some email injection attempts and link dropping. In the context of a form you never want a url to be dropped. The exception is an "add your link" script, in which case you take an entirely different examination on the URI field.

A second important filter is the email address itself, which can be appended

<input type="text" name="email" value="address1@example.com,address2@example.com,address3@example.com">

and many other hacks can be applied - but this is already running too long. Google form abuse site:webmasterworld.com for the ongoing saga against form abuse.

log_data(); // Store it raw
screen_data(); // Error if garbage is found
validate_data(); // return to form on honest user errors
its_safe_to_proceed(); // send email, output response

This is your best approach, make filtering garbage your job, not the end users.

A bit of programming advice (Take it or leave it)

header( "Location: http://example.com/contactsubmitted.html" );
mail($to,$subject,$message);

When a header is generated the program exits, do you have mail failing? You should mail first, then respond:

mail($to,$subject,$message);
header( "Location: http://example.com/contactsubmitted.html" );

I bolded respond for a reason. :-) Too often I see this, and it makes no sense to me, and my impression is that it's a "lazy" approach: redirecting on submit.

Why would one do this when all the variables are right there for a customized, warm and fuzzy response?

mail($to,$subject,$message);
header("Content-type:text/html");
echo "<p>Thank your for your submission, $fname. We will get back to you shortly.</p>";
// echo the submitted data, "here is a copy of your submission";

The only point I disagree on with echoing out the form is it may be confusing to the end user. "What? I just submitted the form, did it not work? Should I send it again?"

craighaggart

1:30 am on Nov 25, 2009 (gmt 0)

10+ Year Member



So much great information! You folks are wonderful. Now I'll have to do more intense PHP study so I can learn what all of the suggested code elements do and how to tailor them to my site. It's a small volunteer organization, and the contact page merely provides someone a way to get in touch with us without the downside of putting an e-mail address on the page (which I was warned not to do many times by many people who have been "spam harvested"). We have no need whatsoever for anyone to put a URL or code of any type in the form; it seemed to me that a simple filter should do the trick.

Dreamcatcher, thank you for helping me to look like less of an idiot by deleting those duplicate posts! Also for pointing out that addslashes is deprecated. I will be moving on to making our site more dynamic, which of course means using MySQL with PHP, so I'm going to be using a lot of this info soon.

Rocknbil, you're right about the order of my mail() and header() statements -- I actually thought I had them the other way around. It works the way it is, but it does make logical sense to switch them so the mailto() comes first.

The response -- a call to my contactsubmitted.html page rather than an echoed line -- was something I did intentionally. The contactsubmitted page has my custom "thank you" message that includes text to let the user know his/her message was received and has been sent to the proper person, but it's laid out just like all the other pages on my site: Title block on top, nav links on the left, footer on the bottom with "this page updated on [date]" info, and message text right in the middle of the content area. This also allows users to go directly to another page on the site instead of being forced to go back to the contact form page. They're obviously done with that page, so I didn't want to make them have to go back there in order to get to the nav links.

TheMadScientist, thanks again for giving me avenues to explore and for the explanation of the POST vs. post issue. I am new to web programming, so I figured I might as well start out using well-formed XHTML and CSS. I don't have the problem so many others do, where trying to comply with new versions means going back and changing tons of previously written code. I had the benefit of starting with a clean slate.

And I totally see what you mean about "POST" in caps being clearly code, where "post" in lower case refers to uploading a message to a forum. You answered my question completely; thanks again.

-Craig

rocknbil

8:41 pm on Nov 25, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



a call to my contactsubmitted.html page rather than an echoed line .... but it's laid out just like all the other pages on my site

Think on this one a little bit. You can do the exact same thing in your response by echoing out a custom message with the details submitted. something like

include("header.txt");
echo "Custom content here with user's submitted data and other helpful links";
include("footer.txt");

This may be an esoteric point, but it's a tad bit more useful than a simple redirect, so I always suggest it.

craighaggart

2:00 am on Nov 26, 2009 (gmt 0)

10+ Year Member



Hmmm... I'm still not sure what the benefit would be of displaying an echoed message instead of my custom contactsubmitted page, which has the same look-and-feel of all the pages on my site and includes the navbar. I don't actually have any reason to echo back the contents of the form submission. It's not an e-commerce site or anything.

If you want to have a look at it, the "submitted" page is at <snip> . You have to go to it directly because the only way to get to it through the site is by submitting a message on the "contact us" page.

Remember, the site is my first (and only) attempt at web programming. I'm just another volunteer at the Friends of the Library and agreed to do this so the organization could have a web site. I had to learn everything myself from square one earlier this year; I'm not a professional programmer or anything.

-Craig

[edited by: dreamcatcher at 6:40 am (utc) on Nov. 26, 2009]
[edit reason] No Urls Please. See TOS. [/edit]

TheMadScientist

2:15 am on Nov 26, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



He's just saying you could personalize it a bit more with what they entered, but personally, I don't do either... (echo back or redirect)

I've switched to AJAX and show you a little 'your form has been submitted, thanks!' message without ever changing the page or showing you anything you've entered or doing anything else except disabling the form, because it's way easier for me and if you don't know your name or what you said that's not really my problem, plus it always bothers me when sites put my personal information (even just my name) on the screen. I don't know why, but it bugs me, so I don't do it.