Forum Moderators: coopster

Message Too Old, No Replies

Help with Preg replace

Stripping all punctuations and whole numbers only

         

kenchix1

10:31 am on Jan 29, 2010 (gmt 0)

10+ Year Member



I only need the words in a message so I strip all none alphanumeric characters but I also need to remove whole numbers alone such as:

Hi ABC100! My score on FB-101 is 3,100. Marker is 700. My seat number is 5 at 1000 row.

The script should return:

[i]Hi ABC100 My score on FB-101 is Marker is My seat number is at row[i]

My code is here:


function stripAll($stext)
{
$clean=preg_replace('/[^A-Za-z0-9\s]*/', '',$stext);
return $clean;

}

I tried inserting several pattern but still can't get it to work. It either removes all the numbers or not at all.

Thanks in advance for the help.

kenchix1

10:33 am on Jan 29, 2010 (gmt 0)

10+ Year Member



Sorry, this should be the result :

Hi ABC100 My score on FB-101 is Marker is My seat number is at row

(the "[i]" are not included)

rocknbil

7:46 pm on Jan 29, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That's way harder than it looks!

You'd think you could just use word boundary to capture FB-101, but it interprets the 101 after - as a boundary. Anyway this is a bit convoluted but works.


<?php
header("content-type:text/html");
$str = 'Hi ABC100! My score on FB-101 is 3,100.
Marker is 700. My seat number is 5 at 1000 row.';
echo "<p>$str</p>";
$str = stripAll($str);
echo "<p><strong>Stripped:</strong> $str</p>";
function stripAll($stext) {
if (!$stext) { return ''; }
$stext=preg_replace('/[^a-z\d\s\-]+|\b[^\w\-]+\d+,*\d*\.*\d*\b/i','',$stext);
return $stext;
}
?>

Be sure the change the | to an actual logical or pipe. The convolut-edness in the second half should support decimal numbers too, untested.

EDIT: WOO HOO! Brett let us have our pipes back! :-P

kenchix1

3:08 am on Jan 30, 2010 (gmt 0)

10+ Year Member



Thank you very much for the help.

It works with a bit of problem. I noticed the following:

Message Input:
Memo REF 09-501: Out of 100 people, ABC100 got 10th place on 100 days, 15.7 average.

Output:
Memo REF-501 Out of people ABC100 got 10th place on days 157 average

Message Input:
10 Memo REF 09-501: Out of 100 people, ABC100 got 10th place on 100 days, 15.7 average.

Output:
10 Memo REF-501 Out of people ABC100 got 10th place on days 157 average

Should be Output:
Memo REF 09-501 Out of people ABC100 got 10th place one days average

Thanks for the time.

rocknbil

8:00 pm on Jan 30, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, you changed the rules. :-P One needs to know all the possible parameters of an input to build a regex that works in all cases. Don't have time at the moment, others will be by, or I'll return to it later - or you could play with the building blocks there and sort one out.

kenchix1

6:27 am on Feb 5, 2010 (gmt 0)

10+ Year Member



Sorry for not being so clear with the rules. The message varies and doesn't really have any format.

I used the pattern above and to remove numerical figure, I looped through the string (array) and checked it using is_numeric.

$stext=preg_replace('/[^a-z\d\s\-]+|\b[^\w\-]+\d+,*\d*\.*\d*\b/i','',$stext)

I noticed using the pattern above that if a date is formated using "-" and not "/", it is ignored by the preg_replace. so I added additional pattern to remove dates. (ie 12-31-2009 or 2009-12-31)

added : [\d*\-\d*\-\d*]

pattern now is :
$clean=preg_replace('/[^a-z\d\-]+[\d*\-\d*\-\d*]+|\b[^\w\-]+\d+,*\d*\.*\d*\b/i',',',$stext);

please correct me if I'm doing it the wrong way.

Thanks to rocknbil.

kenchix1

11:39 am on Feb 5, 2010 (gmt 0)

10+ Year Member



Also I noticed that the first item inside the array is always blank [0]="" whenever I explode a string.