Forum Moderators: coopster & phranque

Message Too Old, No Replies

'Proper' Function for PERL?

Convert "THIS IS MY TEXT" --> "This is my Text"

         

hiker_jjw

6:33 pm on Nov 25, 2002 (gmt 0)



This is a silly question, but I just can't find anything to help with this problem. I have a friend/client who has several thousand database entries in UPPER CASE who needs the wording changed to Proper Case.

Example:
Convert "THIS IS MY TEXT"
to ----> "This is my Text"

Does anyone know of a "smart" function, library, module for PERL that can change from UPPER CASE to Proper Case? I know about the PERL functions ( ucfirst, uc, lcfirst, lc), but I'm interested in something that is a little more developed. In the example above, it wouldn't make any sense to have "This Is My Text".

I searched over at CPAN, but havn't found anything yet. I know it has to have been done before. I guess I could write my own coding, but why recreate the wheel.

Thanks in Advance!

hiker_jjw

7:44 pm on Nov 25, 2002 (gmt 0)



Just to add to my message. I've done coding such as the following to split and proper the phrase, but I was hoping to find something a little "smarter". Anyone have any ideas?


sub make_proper
{
my ($string) = @_;

@words = split (/\s/, $string);
foreach $word (@words) {
$new_word = ucfirst lc $word;
push (@new_words, $new_word);
}
$new_string = join(" ", @new_words);
return ($new_string);

} # End sub make_proper

jeremy goodrich

7:45 pm on Nov 25, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Do you really want to change it to

"This is my Text"

- or -

"This is my text"

Why would you want the last word upper Cased? It's not proper English. :)

Offhand, don't know of anything that will do that - but - if you only want the 2nd one 'This is my text' then the functions you mentioned should work out well for you.

<added>That subroutine looks fine, it seems you did want the 2nd one.

seems to me that would work out okay for you...</added>

hiker_jjw

7:51 pm on Nov 26, 2002 (gmt 0)



Nevermind, I wrote my own...
Here's what I came up with.
Thanks

#!/usr/local/bin/perl
#
# test.cgi
#
$¦ = 1;
print "Content-type: text/html\n\n";

$string = "IS D.H. LAWRENCE'S LADY: (DE'ALOUME E'FRENCHE'S)";
print "\n\n$string\n\n";
$new_string = &make_proper($string);
print "$new_string\n\n";

$string = "THIS IS A TESTING'S VOLUME I.";
print "\n\n$string\n\n";
$new_string = &make_proper($string);
print "$new_string\n\n";

exit;

sub make_proper
{
my ($string) = @_;
my @words = split (/\s+/, lc $string);
my @new_words = ();
my $new_word = "";

foreach my $word (@words) {

# Starts with Non-Alphanum Character
my $starting_non_alphanum = "";
if ($word =~ /^(\W)+(.*)/) {
$starting_non_alphanum = $1;
$word = $2;
}

# Ends with Non-Alphanum Character
my $ending_non_alphanum = "";
if ($word =~ /(.*)(\W)+$/) {
$word = $1;
$ending_non_alphanum = $2;
}

# Contains a Non-Alphanum Character
if ($word =~ /^(\w+)(\W)(\w+)(\W?)(\w?)$/) {
my $p1_word = $1;
my $p2_non_alphanum = $2;
my $p3_word = $3;
my $p4_non_alphanum = $4;
my $p5_letter = $5;

$p1_word = ucfirst $p1_word;
$p5_letter = lc $p5_letter;
if (length $p1_word > 2 && length $p3_word == 1) {
$p3_word = lc $p3_word;
} elsif (length $p1_word == 1 && length $p3_word == 1) {
$p3_word = uc $p3_word;
} else {
$p3_word = ucfirst $p3_word;
}

$new_word = $p1_word . $p2_non_alphanum . $p3_word . $p4_non_alphanum . $p5_letter;

# Other
} else {
$new_word = ucfirst $word;
}

# Recombine the Alphanum Character
$new_word = $starting_non_alphanum . $new_word . $ending_non_alphanum;

push (@new_words, $new_word);
}
my $new_string = join(" ", @new_words);

$new_string =~ s/(\w,?) And (\w)/$1 and $2/g;
$new_string =~ s/(\w,?) Or (\w)/$1 or $2/g;
$new_string =~ s/(\w,?) But (\w)/$1 but $2/g;

$new_string =~ s/(\w) At (\w)/$1 at $2/g;
$new_string =~ s/(\w) In (\w)/$1 in $2/g;
$new_string =~ s/(\w) On (\w)/$1 on $2/g;
$new_string =~ s/(\w) To (\w)/$1 to $2/g;
$new_string =~ s/(\w) From (\w)/$1 from $2/g;

$new_string =~ s/(\w) Is (\w)/$1 is $2/g;
$new_string =~ s/(\w) A (\w)/$1 a $2/g;
$new_string =~ s/(\w) An (\w)/$1 an $2/g;
$new_string =~ s/(\w) Am (\w)/$1 am $2/g;
$new_string =~ s/(\w) For (\w)/$1 for $2/g;
$new_string =~ s/(\w) Of (\w)/$1 of $2/g;
$new_string =~ s/(\w) The (\w)/$1 the $2/g;

if (length $new_string > 60) {
$new_string =~ s/(\w) With (\w)/$1 with $2/g; #?
$new_string =~ s/(\w) That (\w)/$1 that $2/g; #?
}

$new_string = ucfirst $new_string;

return ($new_string);

} # End sub make_proper