Forum Moderators: coopster
I'm currently writing a content management system and i'm trying to make my functions as tight as possible.
One of the parts of the CMS is the ability for the user to write articles/journal entries. Now, I have written all the functions but can't get my head around this:
I want to accept text from a form and replace all quotation marks (") in the text with the correct entity encoding. However, I don't want to replace the quotation marks within anchor tags or image tags, just the text itself.
For example:
$text = preg_replace('[\"+]!U', ""$1"", $text);
This perl compatible regex i'm using simply replaces all quotation marks with the necessary entity and then uses backreferences to "put the text" back in between the entities. However, it matches all the quotation marks in the anchor and image tags as well.
I could go on forever describing this, but i'm sure someone has the gist by now. If you could help me and save me some time, that'd be great.
I'm sure someone out there must dedicate all their time to regex :P
I apologise in advance if this has been covered before.
A lookahead assertion is what you need.
$output = preg_replace("/(\")(?=[^>]*(<¦$))/", '"', $text); In English: Translate a quote to '"' if you find a "<" following it before you find a ">", or if you find the end of the string before you find ">".
This pattern isn't exactly bulletproof -- it will choke on valid (but weird) HTML that has quoted carets inside HTML tags. (But who does that?)