Forum Moderators: coopster

Message Too Old, No Replies

preg fails to match

         

gergoe

2:18 pm on Jul 10, 2004 (gmt 0)

10+ Year Member



hi,

I've fighting with preg_replace for a while already, it fails to match either anything or only the first alternative in the rgeular expression. I've tried to change/define almost all the modifiers, with no luck. Can anyone tell me what the hell I've done wrong?


$patterns = '/^((1stalternative¦2ndalternative¦3rdalternative¦4rdalternative).*)$/iD';
$replacements = '<bold>\\1</bold>';
$output = preg_replace($patterns,$replacements,$input);

$input is an ordinary string variable with about 10kbytes of text in it, something like this:

blablablabla
[i]2ndalternative[/i]:
blablablabla
blablablabla
[i]4ndalternative[/i] something
blablablabla
blablablabla
[i]2NDALTERNATIVE[/i]:
blablablabla
blablablabla

...and what I have in the $output at the end is just the same as the input, none of the matches are enclosed with the <bold/> element...

coopster

3:22 pm on Jul 10, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



$patterns = '/(1stalternative¦2ndalternative¦3rdalternative¦4rdalternative).*/i';

Don't forget to replace the pipe symbols.

gergoe

3:58 pm on Jul 10, 2004 (gmt 0)

10+ Year Member



But if I do it like that, and if there's a text after the xth alternative, it will not be enclosed by the <bold/> tag (only the text I included in the regex pattern), additionally it will replace all occurences regardless of that it is the first in the line or not (it should match only if a line starts with any of the alternatives and the match should contain the whole line). Now I changed a bit and it looks like this now:

$patterns = '/(1stalternative¦2ndalternative¦3rdalternative¦4rdalternative)/i';
$replacements = '<bold>\\0</bold>';

But once I add the BOL (^) anchor it fails to match except when one of the alternatives is the first word in the input. Why it does not match on the other lines?

coopster

4:12 pm on Jul 10, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Ah, it wasn't as clear at first exactly what you are trying to accomplish. I think I understand now, but for clarification, you want it to end up looking as follows:
blablablabla 
2ndalternative:
blablablabla
blablablabla
4thalternative something
blablablabla
blablablabla
2NDALTERNATIVE:
blablablabla
bla1srAlternativeblablabla

I modified the last line to show that we DO NOT want to bold it if it occurs in the string as such, correct? Also, any text after the "alternative", as in the 4thalternative something line, should be bolded as shown correct?

gergoe

4:23 pm on Jul 10, 2004 (gmt 0)

10+ Year Member



Exactly;

In the meantime I've checked the pattern modifiers and noticed that I missunderstood the meaning of the m modifier - which is what I need right now. By default the input string is threated as a single line, not the BOL not the EOL anchors matches within the text, only at the begining of the text and the end of the text regardless of how many newline chars I have in the text. But if I put the m modifier there, both of the anchors matches proprely.

So thanks for your help anyway, it seems to be working well.

coopster

4:24 pm on Jul 10, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Exactly. You nailed it.
$patterns = "/^((1stalternative¦2ndalternative¦3rdalternative¦4thalternative).*$)/im"; 

gergoe

5:02 pm on Jul 10, 2004 (gmt 0)

10+ Year Member



Almost got it working, only one thing I can't understand; Whenever a match is being replaced the newline characters are included in the match, like this:
Input:

match1
blabla
match2
blabla

Pattern:
'/^(match.*)$/m'

Replacement:
'<smthg>\\1</smthg>'

...and this will be the result:
<smthg>match1
</smthg>
blabla
<smthg>match2
</smthg>
blabla

Actually an extra newline is generated by the preg_replace command, but I just can't understand how/why. None of the anchors are enclosed in a parenthesis, and the . should not include the newlines by default. But then what's the reason?

gergoe

1:29 am on Jul 11, 2004 (gmt 0)

10+ Year Member



Replying to myself;
the problem seems to be connected to the fact that the input string is comming from a microsoft sql server database, therefore the termination of a line is done by the /r/n pair, not just the /n, and the preg functions does not care about the /r. Actually it threats it as an ordinary character. For example the ^.+$ pattern matches on the "\r\n" string with the result of \r...
Is there any way to change this behavior? ...or this is the reason why it does not use the ^$ anchors by default (so it should not bother about the different kinds of line terminations)? Could anyone give me some explanation on this?

coopster

9:43 pm on Jul 13, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



I'm not quite clear what you are asking...do you mean you want the (optional) carriage return to be treated as part of the end of line when using
PCRE_MULTILINE
? Won't happen, at least not through any modifiers. The PCRE docs specifically state newline (\n). You would have to either get rid of them yourself with a regex or string replacement function. Or, if you think it may be in the data as an optional end of line terminator as described, alter your regex...
"/^(match[^\r])/m"

gergoe

10:22 pm on Jul 13, 2004 (gmt 0)

10+ Year Member



Not so good news; this makes the use of the preg functions quite useless when the linebreaks in the input stream is not *nix like... I've already made my way through this problem, one of them was the [\r\n]+ pattern, which does the job nicelly, but only with the m modifier. A bit of shame that they did not took this into account, I hardly can't beleive that I'm the only one on this planet who wants to use different line breaks other than the \n.

Anyway, thanks for your help

coopster

10:51 pm on Jul 13, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



You aren't the only one, we all do. Simply modify your regular expression to accommodate optional characters.