Forum Moderators: coopster

Message Too Old, No Replies

RegEx Expression needed to match post-HTML output text

Strings both wrapped in tags and not wrapped in tags

         

hRook

6:33 pm on Nov 1, 2008 (gmt 0)

10+ Year Member



Hello everyone. This is my first post here at Webmaster World. This site's been a big resource to me and I've finally reached a point where I need to ask my own brain-busting question.

I'm in need of a RegEx pattern. Basically, what I need it to do is match post-HTML output text so I can insert spaces into long words. What this means is that I want to match individual words, be they wrapped in tags or not. Let me give a tangible example. Say you have this string:

Hello there <a href="http://website/" title="website">gentlemen</a> how are <b>you today?</b>

I would like the expression to match "hello", "there", "gentlemen", "how", "are", "you" and "today?". All the ones I've tried have matched tag parameters and so on (like 'href="http://website/"'). I'm wondering if it's as easy as somehow negating an accurate HTML tag matching expression.

Thanks if you can help!

[edited by: hRook at 6:39 pm (utc) on Nov. 1, 2008]

IanKelley

12:43 am on Nov 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It would definitely be possible to write a regex to match words while ignoring tags but it would be easier (and probably more efficient) to just strip the tags out (see strip_tags) and then explode on spaces.