Forum Moderators: coopster

Message Too Old, No Replies

Keyword extraction

how do you extract regularly occurring keywords from a string?

         

acidic

8:41 pm on Aug 27, 2003 (gmt 0)

10+ Year Member



Hey all.

This is more of a general algorithm question but I thought I would post it here because I am going to be implementing the algorithm in PHP.

Does anyone know how to extract regularly occurring keywords from a string? For example given “the wild cat went to the wild west” would give “wild”

“the lazy crazy cow was told that it should be a crazy cow but not lazy” would give the keywords “crazy cow” and “lazy”. I will need it to work with strings of up to 1000 words or even more. Does anyone have any ideas or know where I should look?

Thanks a lot.

jonknee

11:12 pm on Aug 27, 2003 (gmt 0)

10+ Year Member



First you have to think of what means it is "regularly occuring". PHP will need an integer. For example, if the word occurs 3 or more times.

The actual process is pretty simple.

* Feed the data into a PHP array using something like file(). * Iterate through it using foreach.
* In that loop explode() it by spaces (to get words in an array).
* Iterate through that array to look for dupes.
* Record these in an array.
* After it's all done spit out the dupes.

A lot of loops, but pretty logical overall. I wrote a script to weed out duplicates to my liking using MySQL as a data source. This is the same sort of deal except you set what number is considered a duplicate and you don't do back and delete duplicates.