Many scripts take form input and encode and decode particular characters for formatting and display purposes. For example, a line break -
\n - in a block of text may be converted to a <br> tag. Two line breaks might create a <p> tag. Okay, so from my experience, I see lots of scripts that generate tons of
<p> tags - but no CLOSING paragraph tags (WebmasterWorld included). I have some that do the same thing. It's not a horrendous issue, but it does bug me. Wouldn't there be some regex or way of parsing that would not only convert double line breaks to a
<p>, but also CLOSE a block of text with a </p> tag (if that block was followed by two line breaks or?)? Like applying the rules to strip away leading or trailing blank spaces, couldn't the rules be reversed to apply the closing paragraph tag to double line breaks followed by text? Or is the coder's Mobius strip? Just wondering...
I think you're on the right track re: using a regex or similar. Perhaps locate all instances of \n\n and plug in </p><p> which would work for everything in the middle. Test for beginning and end and add the appropriate <p> or </p> in those locations.
<added> of course this is an off-the-cuff response and I'm thinking in terms of PHP</added>
Have a look at this post [webmasterworld.com] for a regular expression in Perl.
While the RE in this post will add another p element when the string ends with two or more newlines, the following RE will not. Notice the (.+?) instead of the (.*?), i.e. we want one or more characters instead of zero or more.
$desc="Aaron\n\nAaron\n\nAaron\n\n";$desc =~ s{(?:^¦(?:\x0d\x0a){2,}¦\x0a{2,}¦\x0d{2,})(.+?)(?=(?:(\x0d\x0a){2,}¦\x0d{2,}¦\x0a{2,}¦$))}{<p>$1</p>}sg;
print $desc;
PHP
You could also use something like this (PHP):
$data = preg_replace(array("'\[h1\]'",
"'\[/h1\]'",
"'\[h2\]'",
"'\[/h2\]'",
"'\[sig\]'",
"'\[/sig\]'",
"'\[url=([^]]+)\]'e",
"'\[/url\]'",
), array("\n<h3>",
"</h3>\n",
"\n<h4>",
"</h4>\n",
"\n<p class=\"sig\">",
"</p>\n",
"_get_name_from_id(\\1);",
"</a>",
), $data);
foreach(preg_split("/\n+¦(\r\n)+/", $data) as $token) {
if(strlen($token) == 0) continue;
if(preg_match("/^<.*>$/", $token)) { $s .= $token; continue; }
$s .= "<p>$token</p>";
} This replaces certain codes like [h1], [sig], [url] with their HTML equivalent and then splits $data on one or more newlines and then adds the p tags around the tokens when appropriate.
Or you could use the Perl regular expression in PHP with preg_replace.
<?php
$new = "Aaron\n\nAaron\n\nAaron\n\n";
$new = preg_replace("/(?:^¦(?:\x0d\x0a){2,}¦\x0a{2,}¦\x0d{2,})(.+?)(?=(?:(\x0d\x0a){2,}¦\x0d{2,}¦\x0a{2,}¦$))/s","<p>$1</p>",$new);
echo $new;
?>
Replace the broken pipe characters with the real vertical bar character. Additionally you need to remove the leading dots from the RE in the other post.
Hope this helps.
Andreas
if it is some problem that I came across myself then I usually have some code in either Perl or PHP sitting around somewhere. It´s mostly a question of finding it.
I have been using the split and join approach in both Perl and PHP for a long time.
The RE is actually based on dingman´s post [webmasterworld.com] in Newby help with MySQL / CGI / HTML - Newlines missing when parsing database data [webmasterworld.com].
If I find the problem interesting and I don´t have any code I usually make it up on the fly and store it in my back pocket for future use in my own code and future reference in these fora. ;)
I do hope that your question wasn´t entirely rhetorical. That would be quite embarrasing. But then I still had the excuse that English isn´t my native language so I couldn´t be expected to recognize all the little hints and quirks.
Andreas
I usually do this with a small loop that splits the text in paragraphs, manipulates each paragraphs separately and then stitch it all together again. In perl, with initial text in $input, leaving output in $output:
my $output = '';
for (split("\n\n+", $input)) {
s/\n/<BR>/gs;
$output .= "<P>$_</P>";
}
This is trivial, but it does allow you to do all sorts of other substitutions on a paragraph by paragraph basis. Also it is very easy to insert the </P> tags.
René.
andreasfriedrich... When you're on holiday do you frequent Perl Golf courses :)