Profanity filter: matching when using a special character

This is becoming a challenge, I'm curious if you guys and gals have any suggestions or feedback.

I'm specifically working on a profanity filter for my message board, replacing bad words with ****. People occasionally try to get around the filter, though, so I'm trying to figure a way to intuitively filter when someone uses a special character in place of a real letter.

For example:

@ss
@$$
$h!+

Or worse:

sĂ·%t (in context the meaning is clear, but replacing the Ă with A just turns it to gibberish)

But I DON'T want to catch, for example:

@gmail
#foo
you're
No!This (no space after the !)

I'm already manually swapping some characters to letters, like so:

%asciiChars = (

# Upside down
'592' =>'a',
'596' =>'c',
'477' =>'e',
'607' =>'f',
'613' =>'h',
'305' =>'i',
'1592' =>'j',
'670' =>'k',
'1503' =>'l',
'623' =>'m',
'633' =>'r',
'647' =>'t',
'652' =>'v',
'653' =>'w',
'654' =>'y',

# Uppercase
'65' =>'A',
'66' =>'B',
'67' =>'C',
'68' =>'D',
'69' =>'E',
'70' =>'F',
'71' =>'G',
'72' =>'H',
'73' =>'I',
'74' =>'J',
'75' =>'K',
'76' =>'L',
'77' =>'M',
'78' =>'N',
'79' =>'O',
'80' =>'P',
'81' =>'Q',
'82' =>'R',
'83' =>'S',
'84' =>'T',
'85' =>'U',
'86' =>'V',
'87' =>'W',
'88' =>'X',
'89' =>'Y',
'90' =>'Z',

# Lowercase
'97' =>'a',
'98' =>'b',
'99' =>'c',
'100' =>'d',
'101' =>'e',
'102' =>'f',
'103' =>'g',
'104' =>'h',
'105' =>'i',
'106' =>'j',
'107' =>'k',
'108' =>'l',
'109' =>'m',
'110' =>'n',
'111' =>'o',
'112' =>'p',
'113' =>'q',
'114' =>'r',
'115' =>'s',
'116' =>'t',
'117' =>'u',
'118' =>'v',
'119' =>'w',
'120' =>'x',
'121' =>'y',
'122' =>'z',

# Special chars
'263' =>'c',
'347' =>'s'
);

foreach $key (keys %asciiChars) {
$mod = '&#' . $key . ';';
$text =~ s/$mod/$asciiChars{$key}/gi;
}

And I tried this tonight but it threw an error, so I need to play with it a little:

$text =~ s/ď/i/;
$text =~ s/ö/o/;
$text =~ s/[š$]/s/;
$text =~ s/Ą/y/;

Before I keep going down this rabbit hole, trying to find every possible variation and swapping it, can you guys suggest a better way to find when the user is trying to get around the filter?

my @badwordsa=('#*$!','damn','piss'); my @badwordsb=('#*$!','asswipe','asskisser','asskiss','kissass','kiss ass','hell'); foreach $c(@noninsertables){ #check personal (alias) name for fowl language foreach $d(@insertables){ if (($name =~ m/\b$c\b/im || s/\W//g)|| ($name =~ m/$d/im)){ $ec3 = 1; $te++; } } } if($ec3 == 1){print" <li> The Following Bad Words Were Found In The Name Field And Must Be Removed:</li>\n";}#end alias foreach $d(@noninsertables){ if($name =~ m/\b$d\b/im){ print" $d";}} foreach $c(@insertables){ if($name =~ m/$c/im){ print" $c ";}}

Profanity filter: matching when using a special character

csdude55

fishmonger

typomaniac

lucy24

typomaniac

typomaniac

typomaniac

csdude55

lucy24

typomaniac

typomaniac

csdude55

lucy24

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week