The trick here is not to remove all malicious code, but focus on allowing only legit text or whatever you're collecting.
What I usually do is simply define characters that I consider legit entries for example (maybe a comment box or something similar):
"A-Z", "a-z", "0-9", ",."
The next step is to identify ways that those characters can be used against your server. It will also depends if you insert the data in a database, this will add some complexity to the problem but if you're able to keep your legit characters set small, it shouldn't be to hard to cover the basic.
So your code could simply parse the input, keeping only the legit characters, possibly removing patterns you identified that could cause problems.
One little note. Since new exploits are discovered over time, I use only one validation file (that can serve different purposes) in a central location allowing me to update it pretty quickly.