I asked a variation of this question in 2006 - am back again for another shot.
We have tens of thousands of keywords driving traffic to our site. I'd like to be able to cluster the keywords into themes.
I've tried simple methods of stemming the keywords, and then sorting phrases alphabetically to get 'signatures' of phrases. (Using a method similar to RKG Duck). I'm also stripping stop words to get a 'cleaner' signature.
Example of keywords driving traffic:
uncle tony's widget banquet
widget banquet by my uncle tony
Stemmed, stopped and sorted:
banquet tony uncle widget (possessive removed)
banquet tony uncle widget (by and my stopped)
This method would not add the following keyphrase to the above group:
lovely meals by dad's brother anthony
Now short of me hand coding my own thesaurus is there any way or tool that would help in this clustering process.
Thanks for your time.