Forum Moderators: coopster
I found the StrToLower and UppercaseFirst functions which are going to be my friend here I suspect, but ideally what I'd like to do is a bit of basic logic as follows (speed is not an issue as this will only be called on post, then printed to the DB in it's rewritten form):-
1. If the post is all caps then lower-case the lot.
2. Capitalise first word at start of each new sentence.
3. Capitalise "i" if it has a space either side of it.
I think those three would do the trick quite well.
Does anyone have any code for this, or is there perhaps an open-source forum which has such a function that I could pinch?
In terms of detecting if it's all uppercase, would one single lc character mess that detection up? Is there a way, for example, to detect if something is 90% uppercase?
Thanks,
TJ
$text = ucwords(strtolower($text)); Uppercasing I appropriately is a bit trickier -- if the "i" is at the beginning of the subject, e.g., "i need help", then simply capping it if it's surrounded by a space on either side wouldn't do it. Or a construction like "script fails-I don't understand." Parens, hyphens -- there are lots of ways that a standalone "i" could end up not being surrounded by whitespace.
Also, you're not accounting for acronyms or other usages that should be capped in some other way -- PHP, MySQL, or iPod, for example.
You could simply display the rewritten subject line back to the person in an editable field, with a note that all-caps subjects aren't allowed, and let them edit it.
Checking for mostly caps should be pretty easy. Use a regexp to count all occurrences of [A-Z], then compare that count to the total character count of the subject. If $allcaps > .9 * $totalchars, do your rewrite bit.
$text = ucwords(strtolower($text));
Yes, that's the kind of thing I was thinking of, but would that capitalise the first word of every sentence automagically? Or do I first need to break down to the full stops?
Uppercasing I appropriately is a bit trickier
Yes, I agree. Perhaps between us we could thrash out a rule set:-
Capitalise "i" if:-
1. It has a space either side
2. It has a full stop or dash on it's left and a space on it's right.
3.?
acronyms
e.g. - iPod, generally don't have a space on the right, or if they do, have an alpha char on the left?
It's not going to be possible to get it 100% I'm sure, but to be honest, 90% would do. Bear in mind that non all-caps posts would get left alone, so the only affected ones would be the all caps ones, and a 90% fixed version is going to be a lot better than the all caps in any event.
Checking for mostly caps should be pretty easy. Use a regexp to count all occurrences of [A-Z], then compare that count to the total character count of the subject. If $allcaps > .9 * $totalchars, do your rewrite bit.
Excellent idea....
Larry - that's not such a bad idea actually. You could do it in a little more friendly manner:-
<AutoEdit by JediBot>Please don't post in all caps - thanks</AutoEdit>
TJ
It would be easy enough to explode the string on periods, then run ucfirst() on each array element, then implode it back into one string -- but now you're not accounting for abbreviations that might be used -- if the person enters i.e., e.g., etc., or any other abbreviation, the next letter after each period would be capitalized.
If it were me, I think I'd just do the ucfirst(strtolower($text)) on the subject, then re-display it and let the user edit it, with a note about all-caps not being permitted. If you're dead-set on creating an elaborate set of rules to avoid that, it would be an interesting exercise and I'll be happy to participate. But I think it would be faster and easier to push the edited subject back to user for final editing. (And of course, their edited subject would also have to be run against your rules again.)
If you're dead-set on creating an elaborate set of rules to avoid that, it would be an interesting exercise and I'll be happy to participate.
Not so much dead-set, I just find it quite rewarding coming up with things that work in an automated way like this - even if it only works to 90% accuracy. And I bet it could get honed down quite nicely over a period of time.
So if you're up for it, I am ;-)
I'll do a first draft of a function next day or so and come back and post here - anything you feel you could add to the two rules regarding the "i" above?
Rule set for exploding sentences:-
1. Full stop must be preceded by a letter.
2. Text between full stops must be >5 chars.
3.?
Again, I appreciate true intelligence here is not possible, but 90% would be great....
TJ
Another rule for sentences: Either full stop must be immediately followed by a space, or must not be immediately followed by a comma or another alpha character (as in, "e.g.,"). I'm not sure if that rule would work best with a "must be followed by" or a "must not be followed by." What else might come after a period that doesn't signify the end of a sentence, and would it be easier to define a ruleset for what to include, or what to exclude?
Also, for capitalizing "I", something along the lines of, the i must be followed by either a space or a single apostrophe (I'd, I'll, I'm).
Just got back from getting husband's arm fixed up at the doctor, so I need to get some actual work done now. I'll check back later.
What else might come after a period that doesn't signify the end of a sentence, and would it be easier to define a ruleset for what to include, or what to exclude?
This is perhaps the trickiest one. I would think it's safest to define what does get included?
Also, we have to consider:-
NOT EVERYONE USES A SPACE AFTER A FULLSTOP.SOME PEOPLE WRITE LIKE THIS. As it stands, the ruleset so far would turn that into:-
Not everyone uses a space after a fullstop.some people write like this. Which is not entirely bad, and still better than full caps? I can't see a way around that which wouldn't destroy filenames or links, eg:-
somedomain.com/myfile.html Could get converted into:-
somedomain. Com/myfile. Html ... if you try to parse it with a basic full-stop rule.
With that in mind, although it's a bit hit and miss I think it's best to assume "<dot><space>" to be an end of sentence full-stop. Worst case scenaria is the odd first word capital letter might get lost.
The other thing to consider would be not parsing anything in between CODE or PRE tags.
i must be followed by either a space or a single apostrophe (I'd, I'll, I'm).
Good point.
TJ
$pattern = '/(\W¦^)i(\W¦$)/';
$replace = '$1I$2';
Perhaps better still would be simply
$pattern = '/\bi\b/';
$replace = 'I';
It would miss some like
iPod begins with a lowercase "i" => "I"
It's spelled "w-e-i-r-d" => w-e-I-r-d
If you try it on some real text, I'd be curious to know.
I NEED ATTENTION. pLEASE READ MY POST!
Checking for >90% wouldn't be too difficult either. Reformatting it serverside would be a headache though and may require ongoing tweaking. As yourself if you want to commit to this project indefinately!
By the way, I NEED ATTENTION. pLEASE READ MY POST!
As yourself if you want to commit to this project indefinately!
Could I better spend my time in terms of the end result achieved?
Definitely ;-)
But this is as much about my honing some php/regex skills as anything else - I'm sure I'll learn a lot in the exercise.
Whether or not a useful and usable function actually comes out of it is secondary...
TJ
You'll waste more time making it than you will editing it.
I think I'll learn a lot from making it. I'm not a php coder, so this is something that I can play with, learn from, and it's at least more useful to me than working through an example in a book which I can't use. I haven't used RegEx from php yet.
have fun 'programming for all eventualities'
I think getting it 90% there is enough, and should be quite easy?
TJ
E.g. MACHENDRY becomes Machendry
We then convert all Mach to MacH using a table of rules and end up with MacHendry.
However, this conversion would not work for MACHINERY so we have a second set of rules that convert MacHinery back to Machinery. This table also contains exceptions such as BMW.
Pete