Forum Moderators: coopster

Message Too Old, No Replies

How to limit text to 3 sentences only?

         

tpb101

3:13 pm on Dec 8, 2014 (gmt 0)

10+ Year Member



I will have a $content variable, with unlimited number of text (can be any number of sentences, and paragraphs)

Every sentence in English ends, with either dot, ellypses, question mark, or exclamation mark, so these could be:

$dot = ".";
$ellypses = "...";
$question = "?";
$exclamation = "!";

What I am trying to achieve is limit $content to the first three sentences only, and get it displayed on a blog. This would mean that any combination of these (any instance) would need to be recognized, "counted to three", and than stored in another variable, which could be called $shortened_text (it does not have to be printed on a page, I will be using this with CyberSeo plugin for Wordpress).

Would you be able to suggest how to write a code for something like this (and if not possible, what php functions I should use)?

Thanks.

aakk9999

3:22 pm on Dec 8, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I think you want to search for a space after these delimiters, especially after the fullstop e.g.

$dot = ". "; 


Could you have multiple exclamation marks that end the sentence? Something like: "Hey, watch out!!!"

Or even multiple question marks???

Or you would not care and take the first one only (in which case do not include space in the search string after these two).

tpb101

3:29 pm on Dec 8, 2014 (gmt 0)

10+ Year Member



Yes, that's right. I would need to include things like "!", and "?" too (I am not sure why but it shortens three exclamation marks into one, and two question marks into one). Anything that would make sense / would be needed for this...

Thanks!

LifeinAsia

3:36 pm on Dec 8, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Another problem is false positives like "Mr. Smith" and "Elm St."
$dot = ". ";
You also might have sentences ending a quote or inside parenthesis. Example:
He said, "This is the end." (At least, that's what I thought he said.)

In these cases, searching for ". " wouldn't work.

Hope you didn't think this would be easy. :)

aakk9999

3:48 pm on Dec 8, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Forgot about quotes! Good catch!

Why not just count a number of chars and cut off at the full word after x number of chars, adding three dots?

It would be more even as you can have a very long or a very short sentences.
Rules are easier too :)

LifeinAsia

4:05 pm on Dec 8, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Why not just count a number of chars and cut off at the full word after x number of chars, adding three dots?

That's what I often find to be easier as well. A slight downside is that if you have many previews together, they all go almost the exact same length and can look a little boxy.

aakk9999

4:33 pm on Dec 8, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In this case, when calculating length, a randomiser can be added.

So lets say text length in chars = (300 + random number 0-100), then find the first space after this and cut.

lucy24

5:07 pm on Dec 8, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Every sentence in English ends, with either dot, ellypses, question mark, or exclamation mark

Speaking as someone who used to make e-books: That's not the half of it. (But do you really need to consider ellipses? Not many people use the … single-character version.)

It's more like
(?<![A-Z][a-z]|[A-Z][a-z][a-z])[.!?…]+[)\]'"”’]* 

(You can't see the trailing space.) Most RegEx engines don't allow lookbehinds of variable length, so you have to lay out each one separately. And just watch: someone will post a sentence containing a longer abbreviation such as "Messrs." There are probably a couple more characters that I've forgotten at the moment. And what if you've got annoying clients who insist on posting emoji?

Why is it a certain number of sentences rather than a particular word- or character count?

LifeinAsia

5:35 pm on Dec 8, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



A few other, possibly minor, issues:
- If a sentence ends a paragraph, there won't be a space after the punctuation, It will be some combo of CR/LF, depending on your system.
- If you allow HTML, a particular sentence could be followed by one or more closing (or even opening) tags.

So I would definitely lean more towards the following:
- strip out HTML tags
- cutoff after a specific number of characters (with a randomizer if you want it to look a little more "ragged")
- trim back to the previous space or newline/carriage return
- add "..."
- display

lucy24

11:03 pm on Dec 8, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If a sentence ends a paragraph, there won't be a space after the punctuation

Oops, forgot that one! In fact my original ebook code was for "missing punctuation at paragraph end". And that's from when I was producing \r\n on a Mac, so I couldn't even use $.

You'd have to replace each literal space in the pattern with
\s

and then use something like
($|</p>)

at the end to allow for users who omit the final punctuation in their no-more-than-three-sentences utterance. And what about users who type something like
Whoops! forgot one item

or
And then … there were none

with mid-sentence punctuation?

:: detour to text editor ::

Honestly, I think character count is the way to go. A full-fledged pattern covering all possibilities would be hellishly complicated. (Yes, I could make something up, because of course I could ;) But is it worth it?)

penders

4:29 pm on Dec 9, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In this case, when calculating length, a randomiser can be added.


The "randomiser" could perhaps be related to the total length of the document? Longer snippets for longer documents...?