Forum Moderators: coopster

Message Too Old, No Replies

preg_match close <li> tags

regex help

         

jamie

1:04 pm on Dec 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



hi,

one of the cms bits i am using is htmlarea which doesn't close <li> tags except the last one of the list. e.g.

<ul>
<li> hallo james
<li> hallo james
<li> hallo james </li>
</ul>

but trying to close these is proving difficult.

so far i have three different preg_matches, which work, but there must be a simpler way? ;-)

the pattern i have been working on is:

$str = '<LI><A HREF=PAGE.HTML CLASS=BROWN>hallo</a>
<LI>hallo james

</li>

</ul>';

// close lists
$pattern = "/<li>(.*)(?=<li>)/is";
$replacement = "<li>$1</li>\n";
$str = preg_replace($pattern, $replacement, $str);

// close last list before </ul>
$pattern = "/<li>(.*)(?=<\/ul>)/is";
$replacement = "<li>$1</li>\n";
$str = preg_replace($pattern, $replacement, $str);

// remove double </li>
$pattern = "/<\/li>[\n\r ]+?<\/li>/is";
$replacement = "</li>\n";
$str = preg_replace($pattern, $replacement, $str);

much obliged for any help! :-)

[edited by: jatar_k at 12:05 am (utc) on Dec. 9, 2003]
[edit reason] disabled smiles [/edit]

coopster

6:07 pm on Dec 8, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Personally, I think it would be easier to upgrade your version of htmlarea -- if that is at all possible. I have the latest version and it closes the tags on each iteration of a list item nicely. I don't know which version change it was that made that update. You will want to be careful if you take this approach though -- test, test, test.

If this is not an option, then a regular expression is probably going to be your best bet. Let us know -- coopster

jamie

8:48 pm on Dec 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



hi coopster,

i have got html area configured with several of my admin scripts, and after looking at the forums and the procedure to upgrade from htmlarea2 to 3, i think i'll stick with it - 'if it ain't broke, don't fix it'. ;-)

the preg_replace is actually only being used in the admin section, which is updated very rarely, so even if i have to make three preg_replace calls, the cpu / time overhead is negligible.

i am sure the preg_replace is not too hard, it ought to be a matter of using the pipe between <li>¦</ul> - something like

(?=<li>¦<\/ul>)

but inspite of endless reading of assertions [us2.php.net], i am still in the dark.

i think i ought to try again after a bit of sleep ;-)

cheers

[edited by: jatar_k at 12:05 am (utc) on Dec. 9, 2003]
[edit reason] disabled smiles [/edit]

coopster

12:02 am on Dec 9, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



I used a look behind assertion since we are looking for something that isn't there. What we do know is there is an <li> element with no closing tag, except on the last one. This works, but it assumes there is always a newline before any <li> element tag:

$pattern = "/(?<!<\/li>¦<ul>)\n<li>/is";
$replacement = "</li>\n<li>";
$str = preg_replace($pattern, $replacement, $str);

Breaking it down:
It searches for any <li> tag (case-insensitive) preceded by a newline.
When it finds one, it then does a comparison to make sure it isn't the first <li> in the list, nor already preceded by a closing </li> tag. At this point we can assume we have an <li> tag that has not been closed and we'll replace the <li> tag with a closing </li> followed by a newline followed by the opening tag we first found <li>.

Somebody has to have a better solution than this though...

[edited by: jatar_k at 12:04 am (utc) on Dec. 9, 2003]
[edit reason] disabled smiles [/edit]

jamie

8:23 am on Dec 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



hi coopster,

thanks for the input. but as you say it doesn't match all: e.g.

<ul>
<lI>

becomes

<ul>
</li>
<li>

nevermind, maybe it isn't that easy after all. i've now spent most of the weekend on this and other regexes, have learnt LOADS and time to move on - otherwise nothing gets done!

;-) cheers

<added> your does work better than mine though, i just add another one now to take remove the </li> if it comes straight after a <ul>.... getting there ;-)

brotherhood of LAN

9:22 am on Dec 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



maybe a bit more long winded than required, but this should work forthe example you give

$stuff = "<ul> <li> hallo james <li> hallo james <li> hallo james </li> </ul> ";
$stuff = preg_split("'</?li>'ims",$stuff);
$count = count($stuff);
for($i = 1;$i < $count - 1;$i++)
$stuff[$i] = '<li>'.$stuff[$i].'</li>';
echo $stuff = implode("",$stuff);

jamie

10:28 am on Dec 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



hi BOL,

just looking at it, i thought that you may have cracked it, but after playing with it a bit, it unfort. doesn't work with all combos.

it can only handle the exact list format you specify in stuff. when i give it a list with no closing tags, the (count - 1) means i get a list too many - then we get back to checking to see whether there is a </li>, etc, etc

obviously trickier than it looks ;-)

cheers

brotherhood of LAN

10:51 am on Dec 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



OK another try ;)

$stuff = "<ul> <li> hallo james <li> hallo james <li> hallo james </li> </ul>";
$splitstuff = preg_split("'</?[^>]+>'",trim($stuff),-1,PREG_SPLIT_NO_EMPTY);
$stuff = '';
foreach($splitstuff as $stuf)
{
$stuf = trim($stuf);
if($stuf)
$stuff .= '<li>'.$stuf.'</li>';
}
echo $stuff = '<ul>'.$stuff.'</ul>';

That's basically splitting your list (per tag) into an array,then adding <ul>$stuff</ul> at the end. As long as there is an opening/ending tag for each inthe list it should work. Worth a shot anyway.

jamie

5:11 pm on Dec 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



hi BOL,

been away from desk for a while with flu :-(

i have had to swap the </?[^>]+> for </?[li¦ul]+>, otherwise it fails when it hits a normal tag such as <a href>, but it now seems to work quite well

as ever tmtowtdi ;-)

cheers for the input!