Forum Moderators: coopster & phranque

Message Too Old, No Replies

Cleaning up user input

how do you handle it?

         

dingman

11:07 pm on Oct 3, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Most scripting languages have some built-in function for escaping shell metacharacters. There are similar potential issues when you let users enter content to be presented to other users. What do we all do to handle that data and make sure we don't present some user-supplied nasty to our other users?

Myself, I strip out beginning and end tags for "html", "body", "img", "link", "head", "script", "style", "object", "embed", and "applet", as well as removing any tag with a javascript URL or any onWhatever attribute defined.

Is this overkill? inadequate? You have another approach? I'm curious.

jatar_k

3:37 pm on Oct 5, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



I guess it really depends on what type of information you are allowing users to input. I always look at a specific situation and decide what possibilities there are and what problems there may be. I then go a little overboard to make sure I didn't miss anything.

For most of the input I allow I just addslashes and strip away any characters that might mess things up. I have a few though where I allow pretty much anything and I deal with it when I actually display it.

transistor

11:03 pm on Oct 7, 2002 (gmt 0)

10+ Year Member



I amost always use trim(stripslashes(strip_tags($var))) with PHP.
In strict fields, I check if the length of the resulting string is different than that of the original or zero, if so, I consider it an error and return the user for input.
From time to time I allow some HTML like <i> and <b>.
I use a special tag for a href: <a>URL</a>.

Dingman, I think your approach is pretty good too.
If your site depends on user input, check everything, everywhere.

pkchukiss

12:58 pm on Oct 8, 2002 (gmt 0)

10+ Year Member



For myself, I use Perl's regular expressions to strip anything between the two angular brackets. For example, <HTML> and </HTML> would be removed.

I really hate people who try to distort the page by adding custom HTML codes. If everyone did this, the page would be extremely distorted. A good example would be Neopets' noteboard.

andreasfriedrich

1:20 pm on Oct 8, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I use Perl's regular expressions to strip anything between the two angular brackets

Have you checked how your regular expressions handle HTML like <!-- <b>Perl</b> --> or <img src="ac.gif" alt="AC > PHP">? They might not really do what you expect.

Andreas

dingman

4:01 pm on Oct 8, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Have you checked how your regular expressions handle HTML like <!-- <b>Perl</b> --> or <img src="ac.gif" alt="AC > PHP">? They might not really do what you expect.

Good point. The one that occurs to me first would produce 'Perl -->' or ' PHP">?' If I'm not allowing any HTML tags at all, though, I just run it through php's htmlentities() function. I figure for a user who was innocently typing 'P & !P --> false', that's what they wanted. And anyone who trys to insert malicious code gets exposed.

For places where you do want to allow some HTML, just not all, does anyone have a good method for checking to make sure tags are balanced? I can just see some yutz making half a page disappear because they didn't close a tag, and I don't have a check for it yet. Something stack-based, like your standard parenthesis checker with an additional check on the pop to make sure the thing you popped off was in fact the opening tag for the end tag that prompted you to pop it, perhaps?