Forum Moderators: coopster
This is more of a general algorithm question but I thought I would post it here because I am going to be implementing the algorithm in PHP.
Does anyone know how to extract regularly occurring keywords from a string? For example given “the wild cat went to the wild west” would give “wild”
“the lazy crazy cow was told that it should be a crazy cow but not lazy” would give the keywords “crazy cow” and “lazy”. I will need it to work with strings of up to 1000 words or even more. Does anyone have any ideas or know where I should look?
Thanks a lot.
The actual process is pretty simple.
* Feed the data into a PHP array using something like file(). * Iterate through it using foreach.
* In that loop explode() it by spaces (to get words in an array).
* Iterate through that array to look for dupes.
* Record these in an array.
* After it's all done spit out the dupes.
A lot of loops, but pretty logical overall. I wrote a script to weed out duplicates to my liking using MySQL as a data source. This is the same sort of deal except you set what number is considered a duplicate and you don't do back and delete duplicates.