formatting p's and br's - Perl Server Side CGI Scripting forum at WebmasterWorld - WebmasterWorld

Forum Moderators: coopster & phranque

Message Too Old, No Replies

formatting p's and br's

encode and decode characters

idiotgirl

12:04 pm on Nov 7, 2002 (gmt 0)

10+ Year Member

Top Contributors Of The Month

Here's something that probably isn't first and foremost on serious coder's minds, which would explain why it interests me:

Many scripts take form input and encode and decode particular characters for formatting and display purposes. For example, a line break -

\n

- in a block of text may be converted to a

<br>

tag. Two line breaks might create a

<p>

tag.

Okay, so from my experience, I see lots of scripts that generate tons of

<p>

tags - but no CLOSING paragraph tags (WebmasterWorld included). I have some that do the same thing. It's not a horrendous issue, but it does bug me.

Wouldn't there be some regex or way of parsing that would not only convert double line breaks to a

<p>

, but also CLOSE a block of text with a

</p>

tag (if that block was followed by two line breaks or?)? Like applying the rules to strip away leading or trailing blank spaces, couldn't the rules be reversed to apply the closing paragraph tag to double line breaks followed by text? Or is the coder's Mobius strip?

Just wondering...

lorax

1:16 pm on Nov 7, 2002 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

It's on my list of things to research or write! I agree we should have something like that. I want it for a dynamic site that needs to be at least XHTML Transitional compliant.

I think you're on the right track re: using a regex or similar. Perhaps locate all instances of \n\n and plug in </p><p> which would work for everything in the middle. Test for beginning and end and add the appropriate <p> or </p> in those locations.

<added> of course this is an off-the-cuff response and I'm thinking in terms of PHP</added>

andreasfriedrich

1:58 pm on Nov 7, 2002 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Perl

Have a look at this post [webmasterworld.com] for a regular expression in Perl.

While the RE in this post will add another p element when the string ends with two or more newlines, the following RE will not. Notice the (.+?) instead of the (.*?), i.e. we want one or more characters instead of zero or more.

$desc="Aaron\n\nAaron\n\nAaron\n\n"; $desc =~ s{(?:^�(?:\x0d\x0a){2,}�\x0a{2,}�\x0d{2,})(.+?)(?=(?:(\x0d\x0a){2,}�\x0d{2,}�\x0a{2,}�$))}{<p>$1</p>}sg; 
print $desc;

PHP

You could also use something like this (PHP):

$data = preg_replace(array("'\[h1\]'", 
 "'\[/h1\]'", 
 "'\[h2\]'", 
 "'\[/h2\]'", 
 "'\[sig\]'", 
 "'\[/sig\]'", 
 "'\[url=([^]]+)\]'e", 
 "'\[/url\]'", 
 ), array("\n<h3>", 
 "</h3>\n", 
 "\n<h4>", 
 "</h4>\n", 
 "\n<p class=\"sig\">", 
 "</p>\n", 
 "_get_name_from_id(\\1);", 
 "</a>", 
 ), $data); 
foreach(preg_split("/\n+�(\r\n)+/", $data) as $token) { 
 if(strlen($token) == 0) continue; 
 if(preg_match("/^<.*>$/", $token)) { $s .= $token; continue; } 
 $s .= "<p>$token</p>"; 
}

This replaces certain codes like [h1], [sig], [url] with their HTML equivalent and then splits $data on one or more newlines and then adds the p tags around the tokens when appropriate.

Or you could use the Perl regular expression in PHP with preg_replace.

<?php 
$new = "Aaron\n\nAaron\n\nAaron\n\n"; 
$new = preg_replace("/(?:^�(?:\x0d\x0a){2,}�\x0a{2,}�\x0d{2,})(.+?)(?=(?:(\x0d\x0a){2,}�\x0d{2,}�\x0a{2,}�$))/s","<p>$1</p>",$new); 
echo $new; 
?>

Replace the broken pipe characters with the real vertical bar character. Additionally you need to remove the leading dots from the RE in the other post.

Hope this helps.

Andreas

lorax

2:29 pm on Nov 7, 2002 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Andreas,
Do you, like, have a ton of code snippets in your back pocket or do you write them on the fly?! :)

andreasfriedrich

2:57 pm on Nov 7, 2002 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Hi lorax,

if it is some problem that I came across myself then I usually have some code in either Perl or PHP sitting around somewhere. It�s mostly a question of finding it.

I have been using the split and join approach in both Perl and PHP for a long time.

The RE is actually based on dingman�s post [webmasterworld.com] in Newby help with MySQL / CGI / HTML - Newlines missing when parsing database data [webmasterworld.com].

If I find the problem interesting and I don�t have any code I usually make it up on the fly and store it in my back pocket for future use in my own code and future reference in these fora. ;)

I do hope that your question wasn�t entirely rhetorical. That would be quite embarrasing. But then I still had the excuse that English isn�t my native language so I couldn�t be expected to recognize all the little hints and quirks.

Andreas

seindal

3:14 pm on Nov 7, 2002 (gmt 0)

10+ Year Member

Hi,

I usually do this with a small loop that splits the text in paragraphs, manipulates each paragraphs separately and then stitch it all together again. In perl, with initial text in $input, leaving output in $output:


 my $output = ''; 
 for (split("\n\n+", $input)) { 
  s/\n/<BR>/gs; 
  $output .= "<P>$_</P>"; 
 }

This is trivial, but it does allow you to do all sorts of other substitutions on a paragraph by paragraph basis. Also it is very easy to insert the </P> tags.

Ren�.

lorax

4:21 pm on Nov 7, 2002 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Andreas,
No, not entirely rhetorical. It comes from a combination of amazement and envy! :)

idiotgirl

12:21 am on Nov 8, 2002 (gmt 0)

10+ Year Member

Top Contributors Of The Month

Well, this is certainly enough to do a little testing. andreasfriedrich - that's one impressive line of code! seindal's is similar to my current "solution". Thank you both. I'm glad to hear I haven't been the only person dwelling on such trivial things as paragraph tags!

andreasfriedrich... When you're on holiday do you frequent Perl Golf courses :)