homepage Welcome to WebmasterWorld Guest from 54.226.173.169
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
Regex to remove <br> between brackets (< and >)
is it possible?
cosmoyoda




msg:4005413
 3:51 am on Oct 12, 2009 (gmt 0)

Hello,

I've been trying to find a solution for this for literally hours today. I came up with a custom wordwrap function that would wrap text with HTML code without breaking the tags.

My problem, however, is trying to get any <br> tags inside < and > out of the way. Here is my string: [quote]<p>In May 1993, Jackson's fifth studio album <i><a href="/wiki/Janet."[b]<br>[/b]title="Janet.">janet.</a></i></p>[/quote]

What the result should be is this: [quote]<p>In May 1993, Jackson's fifth studio album <i><a href="/wiki/Janet." title="Janet.">janet.</a></i></p>[/quote]

All that I am trying to do is to get rid of the <br> inside the brackets - essentially, cleaning up my HTML code.

Can this be accomplished using RegEx?
Thanks for the help. Really appreciate it!

 

pinterface




msg:4005927
 11:44 pm on Oct 12, 2009 (gmt 0)

To quote the great jwz,
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
;)

While there's probably a way to do what you want with regular expressions, let's start with the original problem: why are you inserting <br>s into existing HTML in the first place? What problem are you trying to solve for which you think this is the right solution? There may be an easier way!

killer7




msg:4005941
 12:05 am on Oct 13, 2009 (gmt 0)

While I'm sure someone out there will say there is a more efficient way of doing this, here goes the easiest to code.

$newString = str_replace("<br>", " ", $string);

I'm not sure what it is your trying to accomplish, but I think that will solve it without using the resources of regex.

cosmoyoda




msg:4005944
 12:26 am on Oct 13, 2009 (gmt 0)

killer7, I know how to use str_replace, but that's now what I am trying to accomplish here. Remember I created this function to wordwrap any HTML-formatted string without breaking the tags. I don't want to remove <br> tags all together, simply those that are inside < and > brackets.

rocknbil




msg:4006503
 8:26 pm on Oct 13, 2009 (gmt 0)

@pinterface, LOL . . . . appended, " unless you are any good at regexps."

I don't want to remove <br> tags all together, simply those that are inside < and > brackets.

Regex out your sentence: begins with < followed by any character and contains <br> (or <br/> for you XHTML fans) followed by any character and ends with a >. Save the < and > and any other characters between.

Try


<?php
header("Content-type: text/html");
$text = '<p>In May 1993, Jackson\'s fifth studio
album <i><a href="/wiki/Janet."<br>title="Janet.">janet.</a></i></p>';
echo "Original, html: $text <br>";
$html4 = preg_replace('/(.*<.*)<br\s*\/*>([^>]*>.*)/i',"$1 $2",$text);
echo "Regular HTML $html4 <br>";
echo "View the code: " . htmlentities($html4) . "<br>";
$text = '<p>In May 1993, Jackson\'s fifth studio
album <i><a href="/wiki/Janet."<br/>title="Janet.">janet.</a></i></p>';
echo "Original, xhtml: $text <br>";
$xhtml = preg_replace('/(.*<.*)<br\s*\/*>([^>]*>.*)/i',"$1 $2",$text);
echo "XHTML $xhtml <br>";
echo "View the code: " . htmlentities($xhtml) . "<br>";
$text = '<p>In May 1993, Jackson\'s fifth studio
album <i><a href="/wiki/Janet."<br />title="Janet.">janet.</a></i></p>';
echo "oops, space before my /: $text <br>";
$xhtml = preg_replace('/(.*<.*)<br\s*\/*>([^>]*>.*)/i',"$1 $2",$text);
echo "XHTML with space in br: $xhtml <br>";
echo "View the code: " . htmlentities($xhtml) . "<br>";
?>

Note that the /* accommodates both <br>, <br/>, and <br />, the \s* accommodates a space after br if present, and the i modifier makes it case insensitive (although if you're doing XHTML, <BR/> is worthy of a ruler across your coding hand.) The same regexp is used on all three. Also note the space between $1 and $2 so it doesn't break the tag.

I have to agree with some of the comments though, if you're putting a <br> inside another tag in the first place it's invalid HTML and you should try a different approach. But this answers the question, I think . . .

pinterface




msg:4006679
 1:33 am on Oct 14, 2009 (gmt 0)

I think you may have proven jwz's point, rocknbil!

[pre]$tests = array(
'<p>Happy little<br>Narwhal.</p>',
'<p<br>>Happy little Narwhal.</p>',
'<p foo=bar<br>quux=baz<br>>Happy little Narwhal.</p>',
'<p foo=bar<br>>Happy little<br>Narwhal.</p><p>I like<br>cheese.</p>');
$rocknbil = array();
foreach ($tests as $test)
$rocknbil[$test] = preg_replace('/(.*<.*)<br\\s*\\/*>([^>]*>.*)/i','$1 $2',$test);
print_r($rocknbil);[/pre]

Array 
(
[<p>Happy little<br>Narwhal.</p>] =>
- <p>Happy little Narwhal.</p>
[<p<br>>Happy little Narwhal.</p>] =>
+ <p >Happy little Narwhal.</p>
[<p foo=bar<br>quux=baz<br>>Happy little Narwhal.</p>] =>
- <p foo=bar<br>quux=baz >Happy little Narwhal.</p>
[<p foo=bar<br>>Happy little<br>Narwhal.</p><p>I like<br>cheese.</p>] =>
- <p foo=bar<br>>Happy little<br>Narwhal.</p><p>I like cheese.</p>
)

Well, one out of four ain't bad. ;)

s!(<[^<>]*)<br>!$1 !g (to use Perl syntax) solves three of four (or all four if you loop until the output matches the input), but I strongly suspect the wrong problem is being solved.

rocknbil




msg:4007030
 6:06 pm on Oct 14, 2009 (gmt 0)

Agreed, the <br> is invalid where it is. The why of this could make it all go away.

It can still be done. Just needs a bit more tweaking. :-)

coopster




msg:4015978
 12:09 am on Oct 30, 2009 (gmt 0)

Rather than use a regex you may want to try the tidy extension [php.net]. It may do the job for you.

TheMadScientist




msg:4021127
 12:54 am on Nov 8, 2009 (gmt 0)

Sometimes I wake up in the middle of the night and think 'I know the answer to that question' and I'm prompted to check the post, then see if I really do know the answer the next day...


$original = array(
'<p>Happy little<br>Narwhal.</p>',
'<p<br>>Happy little Narwhal.</p>',
'<p foo=bar<br>quux=baz<br>>Happy little Narwhal.</p>',
'<p foo=bar<br>>Happy little<br>Narwhal.</p><p>I like<br>cheese.</p>');
$solution = array();

$loop=count($original);

for($i=0;$i<$loop;$i++) {
$solution[$i]=str_replace('<br>','1br1',$original[$i]);
while(preg_match('/<[^>]*1br1[^>]*>/i',$solution[$i])) {
$solution[$i]=preg_replace('/<([^>]*)1br1/i','<$1',$solution[$i]);
}
$solution[$i]=str_replace('1br1','<br>',$solution[$i]);
}

print_r($original);
print_r($solution);

Sometimes I really do.

The short version is: Change the tags on the <br>'s so you can find them within the html tags. 1br1 could be anything unique EG
*br* [ThisIsABreak] and could be done when they are inserted to eliminate the need for the 1st str_replace.

brotherhood of LAN




msg:4021194
 11:11 am on Nov 8, 2009 (gmt 0)

If your text input is from the web, remember that all kinds of unexpected things can happen.

Just for reference, <br> tags can have attributes.

/<br[^>]*>/i

Any 'deeper' HTML 'parsing' I'd suggest using the PHP DOM extension [php.net], Parsing HTML can be an nightmare, particularly with regex.

TheMadScientist




msg:4021422
 1:01 am on Nov 9, 2009 (gmt 0)

Good point... I was just trying to answer the original question, because I don't know all the details and really don't need to. The OP wanted to remove the breaks they put in from within the < > I figured it was likely they didn't randomly insert <BR> <br /> <BR/> <br> <Br title='MakinItToughToPeplaceThisBR'>, etc.

I actually think what they are trying to do will be tougher than they think, because if you have the function() set to put the <br> every N characters and you remove them from 'non-visible' strings (IOW within the <tags>) it changes the count of visible characters between the <br> tags in the text on the page, so your wrap looks wrong.

There could be something I'm missing, but I think it's the issue I would run into next if I was writing the same code. Anyway, my solution is what they asked for, because it's what they were trying to do and there's some things it's good to learn by trying...

I know where I think the next 'obstacle' will be, because I've run into something similar in some coding project somewhere at sometime before, and see the issues a bit easier now, but they may not have and it might be good for them to see, or they may be doing it for some totally different reason than I would and it works in a way I don't see, so the code solves the issue.

(To get it right, IMO, you basically have to count the text within the tags, and then base your <br> insertion on visible text, which is a bit more difficult than removing <br>s from within tags. I think if I was writing the function I would probably pay attention to the <p> tags... For some reason I think they might be important in this setting.)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved