homepage Welcome to WebmasterWorld Guest from 54.227.12.4
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
Bad Words Filter Enhanced
How to not filter valid words
alexelisenko




msg:4035108
 3:10 am on Dec 2, 2009 (gmt 0)

Hello,

I am trying to enhance my bad words filter. currently I am filtering posts using a badwords array

ie. $badwords = array("bad","words");

and using str_replace to replace the words with * characters.

This is fine up until I have a words like bypass, or grass if I am filtering the word "ass" I get a result of byp***, or gr***.

I have tried to use preg_replace, but am not familiar enough to make the proper regex for this. any help or thoughts would be much appreciated.

Thanks.

 

TheMadScientist




msg:4035128
 4:04 am on Dec 2, 2009 (gmt 0)

All you really need to do different with preg_replace for something like this is set a delimiter (most people use /, but I've started using # because I don't have to escape it as often as I do / since there's not as many patterns I try to match with # in them) and then you can use \b which matches a 'boundary' of the word you are searching for...

So, the patterns below match 'WordBoundary'word'WordBoundary'. (Run the test and you'll get it if you don't already.) It's not 'perfect' but will be much better and with a bit of adjusting can be made fairly accurate:

$string='bad words badwords words-bad. words-good wordsgood goodbad bad';
$badwords = array("#\bbad\b#i","#\bwords\b#i");
$cleanString = array("b**","w****");

$cleanedString = preg_replace($badwords,$cleanString,$string);
echo $cleanedString;

It needs to be adjusted a bit to not change the case of the word(s) replaced, because right now it will lowercase every replacement... You'll probably need to use () around the first character and then either \\1 or $1 in the replacement rather than the 1st letter. $1 is preferred, but \\1 is sometimes easier to work with. I'll let you play around with it a bit and see if you can get it working to your liking and specific situation.

NomikOS




msg:4035155
 5:37 am on Dec 2, 2009 (gmt 0)

TheMadScientist is OK.

I have used this expression to delete certain words:

$city = trim(preg_replace("/&\bVicinity\b\bCounty\b\bCounties\b\bArea\b\bCity\b\bThe\b/i", '', $city));

You can use:

$words = array();
$words[] = 'very';
$words[] = 'bad';
$words[] = 'words';

$words = '\b' . implode ('\b\b', $words) . '\b';

$string = trim(preg_replace("/$words/i", '*', $string));

alexelisenko




msg:4035841
 1:51 am on Dec 3, 2009 (gmt 0)

Thank you all for input! MadScientist I used your idea, heres how I did it

$words = array("bad","words","here");

$badwords = array();

foreach($words as $badword){

$badwords[] .= "#\b$badword\b#i";

}

$message = preg_replace($badwords, '*censor*', $message);

Note: I used a loop to add the regex because the list is too long to add manually

Thanks and if anyone has any betetr ways, please share!

TheMadScientist




msg:4035852
 2:11 am on Dec 3, 2009 (gmt 0)

Implode, Explode maybe?

$words = array("bad","words","here");
$words = '#\b'.implode('\b#i¦¦#\b',$words).'\b#i';
$words=explode('¦¦',$words);

print_r($words);

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved