Forum Moderators: coopster

Message Too Old, No Replies

has anyone done any Part of Speech Tagging in php?

scripts/snippets appreciated

         

stargeek

11:00 pm on Dec 21, 2003 (gmt 0)

10+ Year Member



I want to use php to identify grammatical parts of speech in a body of text. But i cannot find even the most basic sample of such an application in php, has anyone else seen/written one?

jatar_k

6:29 pm on Dec 22, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



haven't done that myself but I did a few searches and there seem to be a bunch of tools around, some web based. One of those may give you something to work with.

bcolflesh

6:36 pm on Dec 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



www.coli.uni-sb.de/~thorsten/tnt/

stargeek

7:15 pm on Dec 22, 2003 (gmt 0)

10+ Year Member



most of the cli tools i've found are for linux, which is a problem since i need to parse a large corpus of text on a windows testing server, then upload it to the production linux box. the same reason negates the use of a web based tool, since it will just take to long to query a remote service for the large amount of text I'm procesing.

jatar_k

7:50 pm on Dec 22, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



this mentions some windows tools

www.comp.lancs.ac.uk/computing/research/ucrel/tools.html

brotherhood of LAN

9:05 pm on Dec 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've tried it. the most painful part was parsing html files to be in the right format.

I used a brill tagger, from the command line using exec(""); that was pretty much the extent of PHP usage ;)

I'd planned to then lookup wordnet from the command line with the POS tags, since there would be a lot less ambiguation in the actual meaning of the word.

I acquired the brill tagger here
http:*//www2d.biglobe.ne.jp/~htakashi/software/brill_e.htm

I see there's also a version of wordnet in XML, could be handier for simpler parsing.