Forum Moderators: phranque
I'd like to be able to allow people to use most HTML tags, but I'd like to avoid any possibility of them injecting nasty SQL, using javascript, etc.
So far the steps I've taken are:
1: Replace all apostrophes and quotes with character entities.
2: Addslashes for good measure.
3: Only give select, insert & update permissions to sql. (Maybe when I have more storage space I'll remove the update and just insert a new entry for each revision they update)
4: Use a function which replaces all but a chosen list of HTML tags with character entities.
Here are a few questions:
1) Is replacing all quotes and apostrophes with their character entities going to affect the search engines? Will they display the title: ( UserName's Message: "Hello World" ) or ( UserName's Message: "Hello world" ) if that page comes up in the serps?
2) As importantly, will the search engines understand that Username's Message *means* Username's Message? On the same lines, if I create an intrasite search function, when I query my database and grep for "Username's message," will that match "Username$#39;s message," or are they different on the cellular level?
3) What are the general guidelines when it comes to securing forms on a database-driven site? Just addslashes to everything before it goes in, then strip them on the way out? Use regular expressions to replace words like "javascript" with benign character entities?
Thanks in advance for any advice.
My approach is to screen for what I want in the input fields and simply toss everything else. Invariably this is going to eventally reveal some faults in my thinking and upset a visitor or two, but it's absolutely necessary.
In instances where you want to allow HTML - I've got those - I have a prepared list of approved tags in the database. In any submitted data that allows HTML (all tags are stripped for others!) it finds and checks against this list for anything resembling an HTML tag. If it's in the list, it lives, if not, poof.
This prevents any iFrame or object tags, as well as anything else that may appear that you don't want in submitted data. Same deal, it may upset some customers that that can't put a pop-up Javascript in your page, but it's in the TOS, so too bad. :-)
For mysql quote everything, period, and same deal, don't allow any input that would alter your select statements.
As for your substitution concerns - I don't know if it's so important that you sub out quotes, etc, as it is to make sure the quotes don't change the select statements you'll be using.