Page is a not externally linkable
- Code, Content, and Presentation
-- PHP Server Side Scripting
---- Find x words before and after keyword


lucy24 - 8:36 pm on Oct 5, 2012 (gmt 0)


This version

(?:\[[^\]]+\]|<[^>]+>)*([\w'-]+)(?:\[[^\]]+\]|<[^>]+>)*,?\s+

is a single word, with extra doodads to ignore html or php/bb tags. The word itself is the ([\w'-]+) in the middle. Put in lots of 'em --replacing \w+ or [\w'-]+ -- to make the complete package. If you leave out the tags option and fiddle with the original regex you're back at:

\b((?:[\w-]+(?:'[\w]+)?,?\s){0,2})KEYWORDHERE((?:\s[\w-]+(?:'[\w]+)?,?){0,8})

But I think you want at least {1,2} or {1,8} on each side. Otherwise you could come back with a bare "php". Many RegEx dialects will accept a simple {,2} but you'll need to double-check whether the implied first number is 0 or 1.

If you don't haha want to allow for contractions, simply leave out each
(?:'[\w]+)?
element. Similarly you can leave off all occurrences of
,?
if you're sure you don't want to continue across commas.

It's your call on whether you want to allow words to include - for hyphenated words. That's assuming you don't have-- boo! hiss! --em dashes expressed as -- instead of &mdash; or the actual UTF-8 character. (I use &mdash; because I edit in a monospaced font. Also &nbsp;.) If you wanted to be double-safe you could say

(\w(?:-\w+)*(?:'\w+)?,?\s)

for each word.

You may also want to change \s to \s+ both to allow for multi-spaces-- since they don't affect the html-- and to cover yourself in case of line breaks. Since the Windows line break is two separate characters, \r and \n, some RegEx readers will interpret it as two spaces, though most will pretend it's merely \n. (Mine is bilingual so \r\n is taken as two characters-- and the $ anchor doesn't work in CRLF mode.) That makes the whole package

\b((?:\w(?:-\w+)*(?:'\w+)?,?\s+){1,2})KEYWORDHERE((?:\s+\w(?:-\w+)*(?:'\w+)?,?\s,?){1,8})


Thread source:: http://www.webmasterworld.com/php/4503028.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com