Welcome to WebmasterWorld Guest from 18.104.22.168
e.g. It should find and replace John Doe in:
"John Doe was born on..."
but not find/replace when John Doe's in any tag for example:
<img src="/jd.jpg" alt="John Doe at Webmasterworld" />
Thanks for any suggestions.
I'd start by extracting the content between the <body> tags, followed by replacing all HTML tags with sequentially-numbered markers, storing the tags along the way so they could be restored by replacing the markers. (You might have to special-case <style></style> and <script></script> sections to avoid altering them.) That would make global changes to the document content via regexp simple(r). And once that's done, you just restore the tags that you removed in the first pass, and re-insert the results back into the <body></body>.