Forum Moderators: phranque
# Catch non-alpha and HTML tags
$temp[0] = '(?:\W(?!\s\b)|<.+?>)*';
$str = 'foo s z ball';
# also want to match:
# $str = 'foo sz ball';
# $str = 'foo s-z ball';
# $str = 'foo s<br>z ball';
$pattern =
'(?:'.
'[s\$z]'. $temp[0] .
'[s\$z]'. $temp[0] .
')';
$str =~ s/\b$pattern\b/****/;
print $str; And this
(?:\W(?!\s\b)
(et cetera). Since \W and \s are by definition non-word characters, the form \s\b would have no meaning.
[edited by: phranque at 5:37 am (utc) on Nov 1, 2021]
[edit reason] disable graphic smile faces [/edit]
# Catch non-alpha and HTML tags
$temp[0] = '(?:<.+?>|\W)*';
$str = 'foo s z ball';
# also want to match:
# $str = 'foo sz ball';
# $str = 'foo s-z ball';
# $str = 'foo s<br>z ball';
$pattern =
'(?:'.
'[s\$z]'. $temp[0] .
'[s\$z]'. $temp[0] .
')';
while (/\b($pattern)\b/xgi) {
($temp = $1) =~ s/\W+$//;
if ($new) { $new .= '|'; }
$new .= quotemeta($temp);
}
$str =~ s/\b(?:$new)\b/****/;
print $str; # Catch non-alpha and HTML tags
# changing this from $temp to $ph, since I
# use the variable $temp later
$ph[0] = '(?:<.+?>|\W)*';
$str = 'foo s z ball';
# also want to match:
# $str = 'foo sz ball';
# $str = 'foo s-z ball';
# $str = 'foo s<br>z ball';
$pattern =
'(?:'.
'[s\$z]'. $ph[0] .
'[s\$z]'. $ph[0] .
')';
# forgot to say "$str =~" here; my test
# was using $_, but I'd changed it to
# $str for the post so it wouldn't
# be confusing and ended up
# making it MORE confusing! LOL
while ($str =~ /\b($pattern)\b/xgi) {
($temp = $1) =~ s/\W+$//;
if ($new) { $new .= '|'; }
$new .= quotemeta($temp);
}
if ($new) {
$str =~ s/\b(?:$new)\b/****/;
}
print $str; $str = 'foo #$$ ball';
$pattern =
'(?:'.
'[e#]'. $ph[0] .
'[s\$z]'. $ph[0] .
'[s\$z]'. $ph[0] .
')'; # Catch non-alpha and HTML tags
$ph[0] = '(?:<.+?>|\W)*';
$str = 'foo #$$ ball';
$pattern =
'(?:'.
'[e#]' .
'[s\$z]'.
'[s\$z]'.
')';
$str =~
# match if it's surrounded with \b, \s, or
# common punctuation
s{(\b|^|[\s.,;:'"]|\$)(?:$pattern)(\b|[\s.,;:'"]|$)}
{$1****$2}gi;
# then move on to test again if there's a \W in there
$pattern =
'(?:'.
'[s\$z]'. $ph[0] .
'[s\$z]'. $ph[0] .
')';
while ($str =~ /\b($pattern)\b/xgi) {
($temp = $1) =~ s/\W+$//;
if ($new) { $new .= '|'; }
$new .= quotemeta($temp);
}
if ($new) {
$str =~ s/\b(?:$new)\b/****/;
}
print $str;
EXCEPT for when they use non-alphanumeric symbolsYes, you might need to replace \W with [^\w$@] et cetera, retaining the non-alphanumerics that are most popularly substituted so you can use them as part of your word-specific tests.
otoh, I wouldn’t bother screening for locutions like #$$ because, heck, they’re already self-censored.