Forum Moderators: coopster

Message Too Old, No Replies

Text Formatting

Formatting bodies of text to be structurally correct,

         

Sekka

4:36 pm on Oct 21, 2008 (gmt 0)

10+ Year Member



Hi,

For a website I have been working on I created a filter that would examine some text inputted by a user and fix any structural problems with it.

This task included,

* Trying to fix spacing issues, e.g. double spaces, no space after full stop, etc
* Trying to correct false capitalisation.
* etc

After some final testing with the client I have discovered that my solution is no where near as robust as it should be. Basically it is doing a lot of formatting where it is not required, e.g. URLs, abbreviations, etc.

Sadly, I knew this scenario may arise as there are so many grammatical exceptions to consider.

My question is, does anyone know of a pre-existing solution that does what I am trying to do and is robust enough so there is a very small margin of error? If not, then I will write my own exceptions, but I am hoping someone knows of one.

I've tried Google, but most results show PHP code or output formatting, not actual English formatting.

Thank you.

[edited by: Sekka at 4:36 pm (utc) on Oct. 21, 2008]

coopster

2:59 pm on Oct 22, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



I'm aware of spell checkers, but haven't actually ever looked for a grammar checker. Anybody else?

eelixduppy

3:07 pm on Oct 22, 2008 (gmt 0)



You could technically map out the English grammar structure. It would take the form of a context-free grammar [en.wikipedia.org] and would be a massive project. Sentences are recursively created with specific rules; you could harness this in theory, but it's not going to be easy.

Sekka

3:26 pm on Oct 22, 2008 (gmt 0)

10+ Year Member



Looks like I'm going to have to have a crack at this one myself. Should be helpful in the future if I can get it working properly though.

Thanks for the link eelixduppy, I will definitely give it a good read.

coopster

3:46 pm on Oct 22, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



I did find one open source grammar checker with what seems to be a Java API. It is an open source project at sourceforge: [sourceforge.net...]