Forum Moderators: coopster

Message Too Old, No Replies

how to negate strings in regular expressions

like to match (anything) but not (meat) or (fish)

         

the_nerd

2:07 pm on Mar 13, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi,

if this is trivial, I plead guilty, but I have searched for an hour now and didn't find anything.

The caret ^ can be used to negate single characters if used inside square brackets, but outside they seem to stand for "start of line".

I'd like to match all html-tags with the exception of em, div class = "cite", li, and a couple more.

This one finds all applet, a, table tags, opening and closing. NOe, there must be some way to say "I don't want anything that's inside the round brackets?

<[/]?(applet¦a¦table)[^>]*>

Thanks for your help,

nerd

mcibor

2:57 pm on Mar 13, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's hard...
Here you've got some start on how to do it:

[wiki.tcl.tk...]
but I think better would be to find these appearances first, convert them to sth else (eg to [a], [applett]), perform the regex and return them to previous form.

Hope this helps
Michal

joelgreen

3:32 pm on Mar 13, 2007 (gmt 0)

10+ Year Member



yes, that is hard. you could find all tags instead and than loop through them and unset() all unneeded.

the_nerd

9:35 am on Mar 21, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



mcibor & joelgreen

thanks for your input. I got myself a list of all html-tags and zap anything but the 4 or 5 that I'd like to keep. As long as the posts are rather small, I hope it won't bring down the box ...

But I'm still interested in the reason of this missing possibility (negating strings) - is it just too "expensive" to have it - or is it something that would be more than a "finite state automaton" can handle? (just curious)

nerd.

mcibor

10:16 am on Mar 21, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



To say the truth I don't know why regex doesn't allow this, but it can have sth to do with the fact, that it operates on bytes, not whole strings...

eelixduppy

10:57 am on Mar 21, 2007 (gmt 0)



If you cannot figure out how to do this by other means, there is always the strip_tags [php.net] function that is pretty good at removing tags.

Also remember, preg_replace can take an array of patterns, and an array of replacements. So maybe implementing some sort of BBcode would provide a neater solution. That way you can have a simple regex to remove all tags, and then just replace the BBcode with their respective replacements. Just an idea :)

Good luck!