homepage Welcome to WebmasterWorld Guest from 107.20.25.215
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
Regex: Removing empty tags
Remove empty tags, regex, php, regular expressions
RyanM




msg:1272537
 1:42 am on Mar 10, 2005 (gmt 0)

Hi everyone, I am in need of yet more regex help.

What I want to do is remove any empty tags or tags with white spaace within them from my page. The Regex that I have created is the following:

<[^/¦^!¦^input¦^br¦^img¦^meta¦^hr][^>]*>[\s]*<*/[^>]*>

Basically it says select any tag that is not an end tag, or a comment, or an input item, or a break or a image or a meta tag or a horizontal rule

and only contains spaces

and its closing tag. This script works within dreamweaver but does not work within PHP. An earlier version of this regex is:

<[^/][^>]*>[\s] *<*</[^>]*>

this works in PHP but only where the tags contain no characters between them, on the other hand it works in dreamweaver where there is more than two spaces.

If someone could point out where I am going wrong that would be great. Also if somebody could offer some ideas on how I could select a tag that does not close or comes across another tag of the same sort that is open, ie:

<strong>sdfsfdssfd<strong>sfdfs</strong>

would find the fist strong because there is improper nesting, consequintly I would like to do the same on the other side.

Anyway I can figure out how to do it with what I know of regex's however it would select all the text upto the opening tag, ie <strong>sdfsfdssfd when what I want is simply <strong>.

Thanks

 

ironik




msg:1272538
 3:00 am on Mar 10, 2005 (gmt 0)

Removing whitespace and empty tags is fairly straight forward, I've just modified your regex a bit:

<?php
$html = "<a></a><b>non-empty</b>";
function removeEmptyTags($html_replace)
{
$pattern = "/<[^\/>]*>([\s]?)*<\/[^>]*>/";
return preg_replace($pattern, '', $html_replace);
}

// Usage:
echo removeEmptyTags($html);
// Returns '<b>non-empty</b>'
?>

The duplicate tags thing is a little more complex as your nested tags may or may not be different types, they might be overlapping or whatever. Maybe someone else has a regex for this?

RyanM




msg:1272539
 5:11 am on Mar 10, 2005 (gmt 0)

Hi Ikonic, thanks for that, it works a treat.

However if you look at the example that I gave my first Regex (the one that did not work in PHP) ignored any tags that did not require a closing tag, ie comments, inputs, breaks, images etc. I Modified your regex to:

/<[^!¦^input¦^br¦^img¦^meta¦^hr¦^\/>]*>([\s]?)*<\/[^>]*>/

however it does not seem to work. Any suggestions?

Thanks

Ryan

killroy




msg:1272540
 7:16 pm on Apr 3, 2005 (gmt 0)

Hi,

please first review my post here: [webmasterworld.com...] about the usage of [].

So what you really need to do is convert this:

/<[^!回input回br回img回meta回hr回\/>]*>([\s]?)*<\/[^>]*>/

to

/<(?!input¦br¦img¦meta¦hr¦\/)[^>]*>\s*<\/[^>]*>/

About nesting and overlapping tags, this is not a problem you can solve perfectly using regexes. What you need is a quasi-parser. i.e. Use a regex to plit the text into tags and text and then use a recursive function or stack to match up all the pairs. Allowing overlapping is also not easy, and one of the reasons it's been so difficult to get a browser that both complies to standards as well as beeing kind on shoddy html.

SN

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved