Forum Moderators: coopster

Message Too Old, No Replies

PHP regex question - matching spaces, special characters

         

mnicholas

5:42 am on Sep 19, 2003 (gmt 0)

10+ Year Member



Hello,

Part of a PHP forum script I'm writing includes a profanity filter. What I've got works fine, but I'd like to make it a little harder to get around (that way by the time people have figured out how to get past it, whatever they have written is not an easily-readable curse, and the moderators will probably have kicked them anyways).

What I currently use is something like:


$curses = array ("/shoot/i", "/darn/i", "/shucks/i", "/doodoo/i");
$filtered_string = preg_replace ($curses, "$%&!", $string);

The problem is that while darn and darnit will get filtered-out, things like d a r n or s-h-o-o-t, S.h**U**C*k.S etc. will still get through.

So, I think I need to define a character class with stuff like!, @, #, *,....plus whitsespace, something like

[ !#$%*&()]

or
[\s!#$%*&()]

and then place that between every letter in the words in my curses array, to be matched zero or more times using the
*
metacharacter....so my code should be something like:


$curses = array ("/
s
[!#$%*&()]*
h
[!#$%*&()]*
o
[!#$%*&()]*
o
[!#$%*&()]*[4]t[/4]/i".....);
$filtered_string = preg_replace ($curses, "$%&!", $string);

That makes a lot of sense to me...if only I really knew what I were doing ;)....could anyone tell me what is wrong with my code, or if they have a better solution to the problem?

Thanks very much in advance for your help.

incywincy

6:55 am on Sep 19, 2003 (gmt 0)

10+ Year Member



i'm sure that there are better qualified people to answer this but i believe that the meta character \W represents any non-alphanumeric character in a regular expression.

mnicholas

8:47 am on Sep 19, 2003 (gmt 0)

10+ Year Member



Thanks - that was what I needed....still not sure why it was not working before, but this is much better anyways.

In case anyone is interested, here's my function for filtering out curses...It is obviously a little longer than it has to be, but the way I wrote it makes it easy to insert or remove words quickly, and without thinking about regular expressions. Just add elements to the array with each letter separated by a dash. Make sure that if you enter words that are parts of other valid words, that you put a space before and after it. For example, arse is contained in the word parse, so to prevent the conversion parse --> p#%$#@!, enter it in

$curses
as
" a-r-s-e "
and NOT
"a-r-s-e"


function profanity_filter ($string){
$curses = array("d-a-r-n", "j-e-r-k", "s-h-u-c-k-s");

for ($n=0, $size=count($curses); $n < $size; $n++){
$curses[$n] = "/$curses[$n]/i";
$curses[$n] = ereg_replace ('-', '[_\W]*', $curses[$n]);
}

$filtered = preg_replace($curses, '#%$#@!', $string);
return $filtered;
}

//sample useage:
$text = "darnit, don't be a jerk";
$filtered_text = profanity_filter($text);
echo "$filtered_text";
//outputs: #%$#@!it, don't be a #%$#@!

Thanks again for the help =)