Welcome to WebmasterWorld Guest from 54.159.246.164

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

Regex: Removing empty tags

Remove empty tags, regex, php, regular expressions

   
1:42 am on Mar 10, 2005 (gmt 0)

10+ Year Member



Hi everyone, I am in need of yet more regex help.

What I want to do is remove any empty tags or tags with white spaace within them from my page. The Regex that I have created is the following:

<[^/¦^!¦^input¦^br¦^img¦^meta¦^hr][^>]*>[\s]*<*/[^>]*>

Basically it says select any tag that is not an end tag, or a comment, or an input item, or a break or a image or a meta tag or a horizontal rule

and only contains spaces

and its closing tag. This script works within dreamweaver but does not work within PHP. An earlier version of this regex is:

<[^/][^>]*>[\s] *<*</[^>]*>

this works in PHP but only where the tags contain no characters between them, on the other hand it works in dreamweaver where there is more than two spaces.

If someone could point out where I am going wrong that would be great. Also if somebody could offer some ideas on how I could select a tag that does not close or comes across another tag of the same sort that is open, ie:

<strong>sdfsfdssfd<strong>sfdfs</strong>

would find the fist strong because there is improper nesting, consequintly I would like to do the same on the other side.

Anyway I can figure out how to do it with what I know of regex's however it would select all the text upto the opening tag, ie <strong>sdfsfdssfd when what I want is simply <strong>.

Thanks

3:00 am on Mar 10, 2005 (gmt 0)

10+ Year Member



Removing whitespace and empty tags is fairly straight forward, I've just modified your regex a bit:

<?php
$html = "<a></a><b>non-empty</b>";
function removeEmptyTags($html_replace)
{
$pattern = "/<[^\/>]*>([\s]?)*<\/[^>]*>/";
return preg_replace($pattern, '', $html_replace);
}

// Usage:
echo removeEmptyTags($html);
// Returns '<b>non-empty</b>'
?>

The duplicate tags thing is a little more complex as your nested tags may or may not be different types, they might be overlapping or whatever. Maybe someone else has a regex for this?

5:11 am on Mar 10, 2005 (gmt 0)

10+ Year Member



Hi Ikonic, thanks for that, it works a treat.

However if you look at the example that I gave my first Regex (the one that did not work in PHP) ignored any tags that did not require a closing tag, ie comments, inputs, breaks, images etc. I Modified your regex to:

/<[^!¦^input¦^br¦^img¦^meta¦^hr¦^\/>]*>([\s]?)*<\/[^>]*>/

however it does not seem to work. Any suggestions?

Thanks

Ryan

7:16 pm on Apr 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi,

please first review my post here: [webmasterworld.com...] about the usage of [].

So what you really need to do is convert this:

/<[^!回input回br回img回meta回hr回\/>]*>([\s]?)*<\/[^>]*>/

to

/<(?!input¦br¦img¦meta¦hr¦\/)[^>]*>\s*<\/[^>]*>/

About nesting and overlapping tags, this is not a problem you can solve perfectly using regexes. What you need is a quasi-parser. i.e. Use a regex to plit the text into tags and text and then use a recursive function or stack to match up all the pairs. Allowing overlapping is also not easy, and one of the reasons it's been so difficult to get a browser that both complies to standards as well as beeing kind on shoddy html.

SN