Welcome to WebmasterWorld Guest from 188.8.131.52
What I want to do is remove any empty tags or tags with white spaace within them from my page. The Regex that I have created is the following:
Basically it says select any tag that is not an end tag, or a comment, or an input item, or a break or a image or a meta tag or a horizontal rule
and only contains spaces
and its closing tag. This script works within dreamweaver but does not work within PHP. An earlier version of this regex is:
this works in PHP but only where the tags contain no characters between them, on the other hand it works in dreamweaver where there is more than two spaces.
If someone could point out where I am going wrong that would be great. Also if somebody could offer some ideas on how I could select a tag that does not close or comes across another tag of the same sort that is open, ie:
would find the fist strong because there is improper nesting, consequintly I would like to do the same on the other side.
Anyway I can figure out how to do it with what I know of regex's however it would select all the text upto the opening tag, ie <strong>sdfsfdssfd when what I want is simply <strong>.
$html = "<a></a><b>non-empty</b>";
$pattern = "/<[^\/>]*>([\s]?)*<\/[^>]*>/";
return preg_replace($pattern, '', $html_replace);
// Returns '<b>non-empty</b>'
The duplicate tags thing is a little more complex as your nested tags may or may not be different types, they might be overlapping or whatever. Maybe someone else has a regex for this?
However if you look at the example that I gave my first Regex (the one that did not work in PHP) ignored any tags that did not require a closing tag, ie comments, inputs, breaks, images etc. I Modified your regex to:
however it does not seem to work. Any suggestions?
please first review my post here: [webmasterworld.com...] about the usage of .
So what you really need to do is convert this:
About nesting and overlapping tags, this is not a problem you can solve perfectly using regexes. What you need is a quasi-parser. i.e. Use a regex to plit the text into tags and text and then use a recursive function or stack to match up all the pairs. Allowing overlapping is also not easy, and one of the reasons it's been so difficult to get a browser that both complies to standards as well as beeing kind on shoddy html.