Page is a not externally linkable
lucy24 - 9:42 am on Oct 5, 2012 (gmt 0)
it still needs to skip HTML tags
Skip, or stop? In the original post, it sounded as if you wanted the function to simply stop short when it meets an html tag. Are you looking for something to ignore the tags? For example if one word of the series is <i>italicized</i> you should carry on as if the markup weren't there?
HTML tag package: </?[^>]+>
It's easier to match if you have nice clean HTML, with tags opening right before words and closing right after them. (Also safer, for arcane compliant-user-agent reasons.)
word package:
(?:<[^>]+>)*([\w'-]+)(?:</[^>]+>)*,?\s+
Do you need to capture the series of words, or simply find them? Your example above looks as if you're looking for up to two words before and up to three words after, but not necessarily capturing anything. (Technically yes, but they look more like grouping parentheses.)
Both html tags and bbcode? Urk. HTML is easy because < and > don't have meaning in RegEx. Well, hardly ever. But bbcode is nasty.
(?:\[[^\]]+\])*
And then if you're using a constructor-type function you have to double all your backslashes.
(?:\[[^\]]+\]|<[^>]+>)*([\w'-]+)(?:\[[^\]]+\]|<[^>]+>)*,?\s+
\w is good, because it covers everything. No harm in including numerals, and your text probably doesn't have many lowlines _ in it.
I suppose it's no use asking why a close-parenthesis by itself turns into a wink if it's between a > and a *