Welcome to WebmasterWorld Guest from 54.234.114.182

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

Bad Words Filter Enhanced

How to not filter valid words

     
3:10 am on Dec 2, 2009 (gmt 0)

New User

5+ Year Member

joined:July 4, 2009
posts: 21
votes: 0


Hello,

I am trying to enhance my bad words filter. currently I am filtering posts using a badwords array

ie. $badwords = array("bad","words");

and using str_replace to replace the words with * characters.

This is fine up until I have a words like bypass, or grass if I am filtering the word "ass" I get a result of byp***, or gr***.

I have tried to use preg_replace, but am not familiar enough to make the proper regex for this. any help or thoughts would be much appreciated.

Thanks.

4:04 am on Dec 2, 2009 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member themadscientist is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 14, 2008
posts:2910
votes: 62


All you really need to do different with preg_replace for something like this is set a delimiter (most people use /, but I've started using # because I don't have to escape it as often as I do / since there's not as many patterns I try to match with # in them) and then you can use \b which matches a 'boundary' of the word you are searching for...

So, the patterns below match 'WordBoundary'word'WordBoundary'. (Run the test and you'll get it if you don't already.) It's not 'perfect' but will be much better and with a bit of adjusting can be made fairly accurate:

$string='bad words badwords words-bad. words-good wordsgood goodbad bad';
$badwords = array("#\bbad\b#i","#\bwords\b#i");
$cleanString = array("b**","w****");

$cleanedString = preg_replace($badwords,$cleanString,$string);
echo $cleanedString;

It needs to be adjusted a bit to not change the case of the word(s) replaced, because right now it will lowercase every replacement... You'll probably need to use () around the first character and then either \\1 or $1 in the replacement rather than the 1st letter. $1 is preferred, but \\1 is sometimes easier to work with. I'll let you play around with it a bit and see if you can get it working to your liking and specific situation.

5:37 am on Dec 2, 2009 (gmt 0)

Junior Member

10+ Year Member

joined:Apr 22, 2005
posts: 185
votes: 0


TheMadScientist is OK.

I have used this expression to delete certain words:

$city = trim(preg_replace("/&\bVicinity\b\bCounty\b\bCounties\b\bArea\b\bCity\b\bThe\b/i", '', $city));

You can use:

$words = array();
$words[] = 'very';
$words[] = 'bad';
$words[] = 'words';

$words = '\b' . implode ('\b\b', $words) . '\b';

$string = trim(preg_replace("/$words/i", '*', $string));

1:51 am on Dec 3, 2009 (gmt 0)

New User

5+ Year Member

joined:July 4, 2009
posts:21
votes: 0


Thank you all for input! MadScientist I used your idea, heres how I did it

$words = array("bad","words","here");

$badwords = array();

foreach($words as $badword){

$badwords[] .= "#\b$badword\b#i";

}

$message = preg_replace($badwords, '*censor*', $message);

Note: I used a loop to add the regex because the list is too long to add manually

Thanks and if anyone has any betetr ways, please share!

2:11 am on Dec 3, 2009 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member themadscientist is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 14, 2008
posts:2910
votes: 62


Implode, Explode maybe?

$words = array("bad","words","here");
$words = '#\b'.implode('\b#i¦¦#\b',$words).'\b#i';
$words=explode('¦¦',$words);

print_r($words);