homepage Welcome to WebmasterWorld Guest from 54.161.175.231
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
Forum Library, Charter, Moderators: coopster & jatar k & phranque

Perl Server Side CGI Scripting Forum

    
Textarea Word Counter Needed
How do you do it.
typomaniac




msg:4078365
 2:35 pm on Feb 11, 2010 (gmt 0)

Hi, I need to be able to count the number of words in a textarea input. Counting the characters is easy but need to be able to do it with counting words instead of just characters
    without
using javascript.

 

janharders




msg:4078396
 3:19 pm on Feb 11, 2010 (gmt 0)

that's easy enough, too.
my $string = 'hello, my name is Xasghjda.';
my $count = scalar split(/\W/, $string);
print $count;


it's not pretty, not efficient, but very easy. The above code prints 5 btw ...

phranque




msg:4078441
 4:11 pm on Feb 11, 2010 (gmt 0)

welcome to WebmasterWorld [webmasterworld.com], typomaniac!

just to clarify, you were looking for something "server side" rather than in the browser, correct?
if so janharder's solution would certainly suffice or you could probably do something with a "regular expression" if you needed something more efficient.

hmmm, on second thought that split might create empty array elements between consecutive non-word characters...

janharders




msg:4078520
 5:40 pm on Feb 11, 2010 (gmt 0)

you got me, phranque. It didn't when I first wrote it as /[, .]+/ but then I thought "nooo, the other guys will see it and point out that \W would've been much nicer" (also, once you want to do it right, you'll end up adding alot of word-boundaries...), so I changed it left the + out. I usually don't like to work with \w, because of locales, but \W should work just fine.


my $count = scalar split(/\W+/, $string);

would work better.

typomaniac




msg:4078940
 11:21 am on Feb 12, 2010 (gmt 0)

Works like a charm. Can't say thanks enough. One thing I noticed though, it doesn't count puncuation or other special characters even if typed in separate. Would something have to be added to the regex for that.

janharders




msg:4079103
 4:03 pm on Feb 12, 2010 (gmt 0)

What exactly do you mean?
currently,
"hello. how are you?" would count as four words because ". " is counted as a single word boundary. if you'd want it to count punctuation as words in special situation, you'll have to define the circumstances and I'll be happy to help in putting that into the regexp. Think of something challenging ;)

typomaniac




msg:4079552
 9:22 am on Feb 13, 2010 (gmt 0)

I understand what you are saying but what I meant was, if someone typed in something like Hello, how are you? >>>>>)(*)(* )(* ** it still only shows up as four words, meaning someone could type in all kinds of things like special characters and it would not be counted against the limit allowed. I think what I was wanting to do is put handcuffs on malicious users but like I said the char count will still get them. I apologize for going overboard in what I was asking because I can use a character count and still limit user input. You were still more help to me than you'll ever know.

janharders




msg:4079579
 12:06 pm on Feb 13, 2010 (gmt 0)

Hey, it's a pleasure to help out.
If you defined the wordboundaries stricter, you could have it count differently.

my $string = "hello, my name is Xasghjda. What\nyes sadddd/( /(\$dd";
my $count = scalar split(/[,. ?!\r\n]+/, $string);
print $count;


would count 9 words. To make sure your users don't mess with your design, you might also look at spaces, i.e. while I stay below the character-count (let's say 80) and the word-count, it might break your design if I just submit 79 x "a" without any spaces or dashes where the browser could break the line.
with

if($string =~ m/[^\- \r\n]{20,}/s)

would match any strings with 20 or more chars that do not contain spaces, dashes or line breaks. what you do with those is up to you, either yell at the user or silently insert a space every X characters:
$string = "dasdddddddddddddddddddddddddddddddddddddddddddddddddddd";

while($string =~ m/[^\- \r\n]{20,}/s)
{
$string =~ s/([^\- \r\n]{19})/$1 /gs
}

print '"' . $string . '"' . "\n";

=>
"dasdddddddddddddddd ddddddddddddddddddd ddddddddddddddddd"

typomaniac




msg:4080367
 10:50 am on Feb 15, 2010 (gmt 0)

Bingo! That hit the nail on the head! You were so quick with answers...what would you recommend as a good reference for learning regex for coming up with solutions like this? I can write enough perl(sometimes, though usually with help from great people like you)but regex is really new to me. Thank you once again.

phranque




msg:4080873
 11:40 pm on Feb 15, 2010 (gmt 0)

there are quite a few regexp references linked in the "Perl Server Side CGI Scripting forum Charter" [webmasterworld.com].

janharders




msg:4081196
 10:48 am on Feb 16, 2010 (gmt 0)

I have started with the Mastering Regular Expressions (EAN: 9781565922570, ISBN 10: 1565922573), but that just got me the first few miles. the rest was just getting routine. It took me quite some time, but I'm a slow learner...

glad I could help

typomaniac




msg:4081252
 12:43 pm on Feb 16, 2010 (gmt 0)

I was looking at that book myself and hope to get it when I visit the homeland(U.S.) next time I'm there. The shipping would cost to much here(atleast double). I have Regex Buddy ( [regexbuddy.com ] )I just got it and haven't had time to attempt making sense of it yet. JG software also has a product called Regex Magic and maybe.......slow learner? I'm not sure which is my first or middle name---typomaniac(master of mistakes) or slow learner. Hopefully I can get ahead enough with things to be able to help others as you've helped me. Thanks so much.

janharders




msg:4081370
 4:45 pm on Feb 16, 2010 (gmt 0)

I've looked at regexbuddy some time ago, mostly for debugging regexps, but haven't used it. From my personal experience, in most cases you'll just need the basic functionality, that is character groups, backreferences, quantification, mostly. also, it's very important to know the modifiers (m//#*$! < those!). I've seldomly used look-ahead and look-behind, so I still have to check the manual whenever I need them ... if I ever do.
As for typos, I'm sure you're already including "use strict;" in all your scripts? that makes pretty much sure that typos won't go unnoticed. Also: check out Regexp::Common [search.cpan.org], it contains many solution for everyday problems (such as matching valid emails which can be quite a painful thing otherwise).

typomaniac




msg:4081744
 1:24 am on Feb 17, 2010 (gmt 0)

use strict....one of the meanest things ever discovered..lol. Amazing how simply (#)commenting that line out makes life so much easier. It even got "mad" at me as I tried to replace strings with variables in pursuit of building a language file...i.e.,
$lng{'1'}="All Fields Required"; so that to use the script with a different language it would be a simple matter of replacing the value. It didn't like the part with {'1'} Once the script is running okay I just commented out the use strict line and moved on. As far as Regexp::Common, I looked at that and once again stand in amazement at the lengths people will go to in the pursuit of making life easier for others.

janharders




msg:4082659
 12:55 am on Feb 18, 2010 (gmt 0)

whatever you can do without use strict, you can do with strict. there are exemptions, of course, but that's just the real evil dark voodoo-stuff. I can only recommend to keep it in there ... next time you want to change stuff, fix a bug or extend anything, you'll be lost hunting that typo ;)

btw ... for localization take a look at the Maketext-Family of modules. There's a great article from the perl journal available on cpan [search.cpan.org]. I haven't seen anything that is more flexible and easy to maintain than the maketext-idea.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved