Forum Moderators: coopster

Message Too Old, No Replies

Parsing double newlines into <p> tags using PHP

How to run a full Perl-style regular expression in PHP?

         

frappyjohn

4:45 am on Dec 30, 2002 (gmt 0)

10+ Year Member



In an earlier thread here in July, now closed for replies ( [webmasterworld.com...] ), there was a great discussion on parsing raw textual input to recognize paragraphs and place p tags around them.

Unfortunately, the solution suggested uses Perl. . . and I'm stuck here using PHP (and a novice to it at that!)

I see that PHP has a set of functions for using Perl-style regular expressions, but don't see any that would support the sophisticated features of the suggested Perl regex, which is:

$desc =~ s{
# parens are for grouping only
.. (?:
# match either at start
.... ^¦
# or two or more \r\n (client was Windows)
.... (?:\x0d\x0a){2,}¦
# or two or more \n (client was *nix)
.... \x0a{2,}¦
# or two or more \r (client was MacOS9-)
.... \x0d{2,}
.. )
# capture the paragraph
.. (.*?)
# look ahead for either end of string or two or more \n
# or two or more \r\n, but don´t match what we see!
.. (?=
# parens are for grouping only
.... (?:
...... (\x0d\x0a){2,}¦\x0d{2,}¦\x0a{2,}¦$
.... )
.. )
# replace with
}{
.. <p>$1</p>
}sgx;

Is there a way I can execute such a regex in PHP?

The situation is as follows (I would think this is fairly common): I want a user to be able to edit the content of a web page online. I can let him/her embed the <p> tags himself, but I feel it would be much more user friendly to allow conventional double spacing between paragraphs and replace these with </p><p> for him.

TIA

--Frappyjohn

dingman

4:50 am on Dec 30, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



hit www.php.net and look at the perl-compatible regular expression functions.

s/// => preg_replace()

I have a solution that works for more or less this task. Posting the code wouldn't help you learn, but I can assure you that it's possible.

<added>Some of the complexity is taken out by special handling of \n, if I remember right.</added>

dingman

5:10 am on Dec 30, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK, just looked at my source code. I lamed out and just replaced every occurrence of \n with <br />, which makes it even simpler - no lookaheads required. It does look like you get away with just matching on \n though, as I tried with Windows and then checked the results in an editor that would have shown me any \r's still hanging around. I'm afraid I don't have a Mac available to test with, though.

andreasfriedrich

9:47 am on Dec 30, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As dingman said take a look at the perl compatible regular expression syntax PHP provides. You cannot use the extended syntax, i.e. you need to remove the comments from the expression and stick it into one line. That´s about all you need to change.

<?php 
echo preg_replace(
"{(?:^¦(?:\x0d\x0a){2,}¦\x0a{2,}¦\x0d{2,})(.+?)(?=(?:(\x0d\x0a){2,}¦\x0d{2,}¦\x0a{2,}¦$))}",
"<p>$1</p>",
"Aaron\n\nAaron\n\n");
?>

Note the + instead of the * where we capture the paragraph. * captures zero or more characters. The + will match one or more characters. This way there will be no empty p element at the end of the example string.

Andreas

Note: The WebmasterWorld posting software deletes spaces preceding the exclamation point "!" character. It also replaces a solid vertical pipe symbol with a broken vertical pipe "¦" symbol. Both of these changes will need to be undone in any code you copy from WebmasterWorld. Make sure to include a space preceding the "!" in mod_rewrite code, and always replace "¦" with a solid vertical pipe.