Preliminary disclaimer: I don't speak PHP. But I come from a background of making e-books, so this problem sounds wonderfully familiar.
So if I would like to find 5 words before the keyword and 3 words after the keyword the result i would like to have is: "You are able to code".
Did your cat eat the following three words?
Can we assume for the sake of discussion that your target text will never contain titles or abbreviations such as "Mr." or "Ph.D." -- or, in the alternative, that you normally write these without a period? For that matter, would your mid-sentence words ever be capitalized at all? It's easier if you can exclude all names. Capital letters would then only occur at the beginning of your utterance, unless you anticipate meeting the single word "I". (Doesn't seem likely in this context.)
or a punctuation mark is found (!.?’”)
Any punctuation mark, or these specific ones? You can't exclude a right single quote, because that will also cut out contractions: "You'll be flying high when you start using PHP to code your pages!" They are the same html character, whether you use ' or ’ or ’ And it seems like you should allow at least commas; they're not a major syntactic break. Finally what about numerals? In your own post you use "5" and "3" as words.
How will your quotation marks be encoded: as " or " (“ ” etc) or as “ and ” pairs? What encoding are you in? Are you working from the page source or the visible text?
Oi! What's my system language doing at the end of your string? ;)
One word looks like this: \b[A-Z]?[a-z-]+(?:'[a-z]+)?\b
but multiple words will no longer have or need a following \b. Note too that \b is superfluous if you've followed the string with [^a-zA-Z'-]+. Unless you've got mixed-form words with numerals or lowlines in the middle; those can get messy. Instead you'll go to (keeping the sentence-initial option)
Here I've used literal spaces. (See above about not speaking php.) Note exact position of spaces. The first quantity is 4 rather than 5 because the beginning word is coded differently. If I now decide that my keyword is "the" and ask the text editor to find any & all hits, it supplies me with (capitalization added):
would then only occur at THE beginning of your and my system language doing at THE end of your
Or, if the keyword is "your":
the sake of discussion that YOUR target text will and occur at the beginning of YOUR utterance, unless you
(illustrating the optional , which I put in my regex)
All this is of course assuming strictly ASCII text, as you'll get if your source is in modern English. Otherwise you'd have to replace [a-z] with the appropriate variant of \w -- but this gets language-specific. Both programming language and human language.