Forum Moderators: coopster
I wish to match and remove all html tags except a few used for basic text formatting:
b, i, strong, em, p, font, br, div, span, acronym
So the following regex matches the opening and closing tags I stated (including XHTML formatting for <br />:
'</?(b¦strong¦i¦em¦p¦font¦div¦span¦acronym¦br(?/)?)(?=>)[^<>]*>'
But I want to match all tags that are not those tags! I've looked at regex references and there doesn't seem to be a simple!NOT function. There's the [^...] thing, but that doesn't seem to be what I need.
E.g. something like:
'</?[^((b)¦(strong)¦(i)¦(em)¦(p)¦(font)¦(div)¦(span)¦(acronym)¦(br(?/)?))(?=>)][^<>]*>'
I also thought this might work:
'</?((b¦strong¦i¦em¦p¦font¦div¦span¦acronym¦br(?/)?)(?=>)){0}[^<>]*>'
...but it didn't.
If anyone can point me in the right direction, I'd be so happy. Then my brain could rest.
Thank you!
Prem
I think I did actually see that function, but overlooked the "allowable tags" option, and so didn't use it.
Any comments on whether the tags I have selected to allow are the best and only ones to permit formatted text from a user, but no nasties?
Prem.
I guess I could strip out onclick and other events with preg_replace, but are there other security implications with the method I plan to use?
Why is it that WebMasterWorld.com and other sites completely disable HTML tags in user's posts, in favour of proprietary font formatting tags, rather than the method I'm taking to allow simple, harmless tags?
Is it just so that text in posts will appear as text (such as I intended in the first paragraph of this reply) and not processed as HTML?
Thanks for any thoughts on this,
Prem.
Then strip out all the html
Then the last step change all the STARTBOLD strings, etc. back into the correct html tag.
ex: replace("STARTBOLD", "<b>")
/Webdevjim