homepage Welcome to WebmasterWorld Guest from 54.166.122.65
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
Forum Library, Charter, Moderators: coopster & jatar k & phranque

Perl Server Side CGI Scripting Forum

    
Need to truncate a long text string
So it fits just right in a DIV on my web page
runner

10+ Year Member



 
Msg#: 4349 posted 6:00 am on Jan 16, 2006 (gmt 0)

Oh, how to explain this in a concise way...

I wrote a cgi script that displays a news article from a database of articles. My script takes a long news article, truncates it to make a shortened "teaser" article and then puts a link to the full article at the end of the shortened article.

The problem is that I need to figure out some way to code a perl routine that will truncate the scalar $text to make it shorter AND be sized just right to fit in the alloted space.

I did the first one by hand and hard coded it as substr($text, 0, 755)

However, I can't do every article by hand. It needs to be automated. Although the target area is fixed in size, I can't figure out a way to calculate the shortened string length. BR tags will affect spacing and the text is formatted like a newspaper column so space between words is not fixed. And on top of that I can't truncate in the middle of a word.

It doesn't have to be exact but it should be close. Maybe this just won't work... any ideas?

 

perl_diver

5+ Year Member



 
Msg#: 4349 posted 6:08 am on Jan 16, 2006 (gmt 0)

your question is not at all clear to me, but you can use the length() function to get the length of a string.

my $length = length($text);

runner

10+ Year Member



 
Msg#: 4349 posted 6:40 am on Jan 16, 2006 (gmt 0)

I'll try to explain it in a different way...

On my main web page I have a fixed-size area where a news article goes. The space isn't big enough for the full article so I have to truncate the full article and put the smaller article in the alloted space.

The text for the full article is in a scalar defined as $text

The problem is that I need to code a perl routine that will calculate where to truncate the text string (the article) so the resultant string (truncated article) nearly fills up the alloted space.

Have you ever seen a web site that has a brief news article on the front page? It's just enough to let the user see if they're interested in the subject. They have the option of clicking on the link to take them to the full article.

That's what I'm doing here. I have the full article in $text and I need to figure out a way to programatically truncate the article so it will fit in the small fixed-size space on the main page.

I have a whole database full of these news articles so I have to have a program determine where to truncate the article.

There are a whole host of issues with doing this and maybe it can't be done. You can't just count bytes (characters) because there are line breaks and also the text is justified like a magazine article. (the text is left AND right justified so the spacing between words is not fixed)

I think this is probably impossible for a computer to do. I might have to hire someone to manually determine where each article should be truncated.

perl_diver

5+ Year Member



 
Msg#: 4349 posted 10:05 pm on Jan 16, 2006 (gmt 0)

it does not sound impossible, but it will probably require some work. Also, you may want to rethink your requirements. Since these are just teasers, you may be trying too hard to format them in a specific way. Some may have a few more words or a few less displaying in the alloted space depending on the html code in the text, like <br> tags, but is it really that important? There are ways to do what you want, but if it's beyond your programming ability you may need to hire a programmer or provide more detail, like the code you're currently using, some sample text, etc.

rocknbil

WebmasterWorld Senior Member rocknbil us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4349 posted 8:44 am on Jan 17, 2006 (gmt 0)

This is pretty easy, but is not the **real** problem you're going to encounter though (see below.) This is a hacked-up solution but it works:
(REPLACE ALL Š's with unbroken vertical pipe!)

$length = 150; ## so you can store it somewhere and change it

$clip = &getClip($text,$length);

sub getClip {

my ($string, $len, $elipsis, $leader, $snippet, @wds, $dump);
($string, $len) = (@_);

$elipsis = ' . . . ';
$snippet = substr($string,0,$len); # Get a string of desired max length from the large chunk

if (length($string) < $len) { # If less than length, great.
$leader = $string . $elipsis;
}
else { #If exceeds length . . .
@wds = split(/\s+/,$snippet); #split it up into words
## If the last "word" of $snippet is not a space, it is
## most surely a chopped word, so pop it off the word list
if ($wds[$#wds]!~ /\s$/) { $dump = pop(@wds); }
# If the last word is any of these, pop them off,
# it may mildly move the clip toward "common sense"
if ($wds[$#wds] =~ /forŠisŠbyŠareŠtheŠtoŠinŠfromŠatŠiŠaŠof/i) { $dump = pop(@wds); }
## Re-assemble new string.
$leader = join (' ',@wds) . $elipsis;
}

return $leader;

}

Note how it does what you'd expect, splits up all the words if it exceeds $length, then chops it down to a list of full words only. Additionally this part

if ($wds[$#wds] =~ /forŠisŠbyŠareŠtheŠtoŠinŠfromŠatŠiŠaŠof/i) { $dump = pop(@wds); }

Pops off any articles, etc. that are the last word in the truncated string. hopefully making a little more sense out of it (worked for me so far.)

Because you're always dumping at least one word from your list and maybe more if the next word is one of the articles in the list, this will always be shorter than the $length, so it workes well enough to just adjust $length upward accordingly.

But that's not the big problem.

that will truncate the scalar $text to make it shorter AND be sized just right to fit in the alloted space.

To do this, you're going to have to use a fixed font size. A fixed font size has tons of accessability and un- user friendly issues. So if you allow the user to manage the pages they browse - as in view the text size larger (if their eyesight requires it) or override your stylesheet with theirs (lots of colorblind people do this) then your teaser is going to violate your carefully designed and alotted "space."

The best thing to do is go ahead and add a teaser, but alter your design in such a way so that if the text size changes the alotted space can change without blowing up the page.

But, the above sub should work out pretty well, and you can add any other last-words to the list to chop off if you need to.

tombola

10+ Year Member



 
Msg#: 4349 posted 10:55 am on Jan 17, 2006 (gmt 0)

It is possible to put a truncated text in a box with fixed dimensions when you use CSS (see Perl snippet below).

For example: text must fit in a box of 400x150 pixels.

To get the text nicely in the box, the height of the box should be a multiple of the line height of the font we use.
In this example, the box has a height of 150 pixels, so we set the line-height of our font to 15px. This means we can display 10 text lines in the box.

Here is the CSS (box has a yellow background).
The first div is a wrapper.
The second div is used to put the text " [read full article]" (a link to the full article) in a layer upon the text.
The third div contains the full text and by hiding the overflow, only 10 lines of text are displayed on the screen.
Just copy and paste this code in an empty html document and you'll see how it looks like.

<div style="background-color: #ff0; display: block; width: 400px; height: 150px; overflow: hidden; padding: 0; font: 10pt verdana, arial, geneva, sans-serif; line-height: 15px;">
<div style="background-color: #ff0; z-index: 5; padding: 0; width: 150px; position: relative; top: 135px; left: 280px; font: 8pt verdana, arial, geneva, sans-serif; line-height: 15px;">&nbsp;&nbsp;... [<a href="test">read full article</a>] </div>
<div style="position: relative; top: -15px; width: 400px; height: 150px; overflow: hidden; padding: 0; font: 10pt verdana, arial, geneva, sans-serif; line-height: 15px;">

Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text. Here comes your text.

</div>
</div>

Integrated in a Perl script, it looks like this (text is stored in $full_text):


$truncated_text = qq~
<div style="background-color: #ff0; display: block; width: 400px; height: 150px; overflow: hidden; padding: 0; font: 10pt verdana, arial, geneva, sans-serif; line-height: 15px;">
<div style="background-color: #ff0; z-index: 5; padding: 0; width: 150px; position: relative; top: 135px; left: 280px; font: 8pt verdana, arial, geneva, sans-serif; line-height: 15px;">&nbsp;&nbsp;... [<a href="test">read full article</a>] </div>
<div style="position: relative; top: -15px; width: 400px; height: 150px; overflow: hidden; padding: 0; font: 10pt verdana, arial, geneva, sans-serif; line-height: 15px;">$full_text</div></div>~;

I know this is not an elegant solution, but it works. ;-)

rocknbil

WebmasterWorld Senior Member rocknbil us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4349 posted 8:20 pm on Jan 17, 2006 (gmt 0)

But that is a fixed font size, is that not correct?

perl_diver

5+ Year Member



 
Msg#: 4349 posted 8:31 pm on Jan 17, 2006 (gmt 0)

rocknbil,

Your solution looks pretty good, but doesn't take into account the html code that might be in the snippet which the user doesn't want to have affect the length of the visible text (if I understand the question).

I think CSS could play a part of this and should be looked into, although I really get annoyed at the cross browser incompatability/quirks issues of CSS. The CSS posted might work fine though in all the major browsers.

balam

10+ Year Member



 
Msg#: 4349 posted 9:18 pm on Jan 17, 2006 (gmt 0)

> the text is left AND right justified so the spacing between words is not fixed

What exactly does this mean?

The impression I get from the statement is that $text looks something like:

This&nbsp;&nbsp;text&nbsp;has&nbsp;&nbsp;&nbsp;been&nbsp;justified.

If that's not what $text contains, how is the justification "hard-coded" into the article? Multiple "space" characters would collapse into a single space when rendered in a browser...

$text does (or could) contain HTML tags, correct?

On the subject of using a fixed-width font... With a containing <div> set to "x" pixels by "y" pixels, one would have to set the font size to "z" pixels for the layout to work, I believe. If sizing with ems or as a percentage, I imagine the layout would suffer, since you can only guess as to what the default font sizing is in my browser.

As a side note: I abhor the idea of font sizes set in pixels, given that Macs live in a world of 72 ppi, Windows at 96 ppi, my monitor at 102 ppi, and other OS's at other ppi's. So, a font displayed at 12 pixels is 1/6 of an inch on a Mac, an 1/8 on Windows, and 2/17 on my monitor...

I think I got that right... I'm in some serious need of sleep at the moment. ;)

tombola

10+ Year Member



 
Msg#: 4349 posted 10:36 am on Jan 18, 2006 (gmt 0)

But that is a fixed font size, is that not correct?

No. The font is Verdana. Just copy and paste the code in an HTML page, and you'll see how it looks like.

PS. The font size is not set in pixels but in points, only the line-height of the font is in pixels.
This should work in all CSS-compliant browsers.

runner

10+ Year Member



 
Msg#: 4349 posted 3:19 am on Jan 20, 2006 (gmt 0)

What I meant by "the text was left AND right justified" is this "text-align: justify;" is set so it prints like a magazine or newspaper article.

I'm goint to try the code listed above this weekend.

runner

10+ Year Member



 
Msg#: 4349 posted 3:32 am on Jan 20, 2006 (gmt 0)

Can I ask a dumb question here? How do all these other web sites print out shortened versions of long articles like I'm trying to do? Is it that most people display the shortened article in a resizable area so text length is unimportant? Or have you actually seen a web site print the shortened article in a consistent, fixed-size area.

The more I think about it I think I am going to change my layout and make the "shortened article" area a resizable area, maybe put int in a < td > tag. Then use the perl script to estimate the correct article length and dump it in there. That way the font can be resized etc. and I won't have to worry about the defining the div in css at a specific height where it might chop off part of a line.

It just seems to me this problem has been solved by someone else already!

balam

10+ Year Member



 
Msg#: 4349 posted 4:08 am on Jan 20, 2006 (gmt 0)

> Is it that most people display the shortened article in a resizable area so text length is unimportant?

That's the direction I took. The article snippets I use are 256 characters long (including spaces, but not breaks, like a new paragraph), and are displayed in a proportional font. As a result, the snippets take up a varying amount of screen real estate - and I don't worry about it.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved