There's much more to it. output filtering
htmlentities is a start, but you need to be careful to use it properly: with the right encoding and suppressing the right quotes for the area you're in when outputting it.
e.g. the HTML:
<a href="AAA" rel="BBB">CCC</a>
CCC: is doesn't need quotes to be escaped and if you do you increase the problems you'll have when reading the source code
BBB: it needs double quotes to be escaped or your data might start adding more attributes
AAA: it should be urlencoded, not just htmlencoded e.g. a space should be encoded as "%20" or as "+".
Moreover if you're going to output things like XHTML5 you can't use htmlentities as you are only allowed to use & < > " and ' all the others must be UTF-8 (I did not check if other character encodings are still allowed - I only use UTF-8 anymore) escaping stuff from your database.
I prefer not to do this at all.
First: Use mysqli calls instead of the (should be) obsolete mysql ones.
And then use prepared statements whenever it's possible instead of trying to fix things with escaped characters. That way the database knows where user input is and it will not confuse data for statements. Input validation
This is the really big one: consider all user input (i.e. all coming from the browser) to be TAINTED, even if it was filtered on the client side. You need to clean it to a point where you *know* it is valid before you use it for anything.
There are 2 tactics:
- whitelist: you make sure that everything in the input is known good (so all characters are ok, the combinations are ok, the range (length, min, max) is ok etc. This is the hard approach but the most secure. It is relatively easy if you are expecting e.g. a response froma drop down list: you know the possible values very well. It is especially hard if you're going to have to do this on e.g. a review of a hotel: the possible inputs are much wider and free.
- blacklist: you make sure all that can harm you is removed. The tricky bit is that you need to know what can harm you. E.g. a "." in string that's going to be used as a filename (e.g. logo.png) will not be considered harmful. But that same dot will be quite a different story when used as "../../../../etc/passwd". Still in many cases blacklisting is all we can achieve and hence a very tricky thing that yields many "fatal" mistakes into production environments.
In practice one often combines the two tactics (and my whitelisting definition isn't 100% pure already due to this).
The top 10 of mistakes (learn from what others failed to do right):
If you look for help not having to implement your own libraries:
It's available for PHP (although I've not used it myself (yet)):
Note: I'm not affiliated with OWASP.