Forum Moderators: open
The research concluded expressing limitations on this method for their original intended purposes, but held out promise for other possibilities.
Due to the similarities between spam and non-spam our original semantic analyzers are not an effective method to classify spam content. Since spam and non-spam documents are so similar, it is sometimes very difficult for a human to differentiate between the two. Because of these similarities, it is unlikely that any natural language analysis method will be successful in differentiating between spam and non-spam.However, using semantic analyzers to determine the usefulness of information on a webpage had much more promising results. Assuming the user is more interested in finding a quick answer to their query, a page with more textual information should have a higher rank. Our analyzers could help to determine this rank.
Stanford Semantic Analysis Paper - PDF Document [stanford.edu]
HTML Version from the Google Cache [64.233.167.104]
This is the open source software at SourceForge, which uses a variant of LSA
Infomap NLP Software [infomap-nlp.sourceforge.net]
And here's the demo & search engine at the Stanford project site
Infomap Demo & Search Engine [infomap.stanford.edu]
Plus some other semantics related toys to play with there.
For the really dedicated fans of semantic analysis there's another available at the Princeton University Cognitive Sciences Lab
WordNet, a Lexical Database for the English Language [cogsci.princeton.edu]
Now the task is to clarify and classify the principles for purposes of practical application.
It is a hot topic. Fair enough, it might be whispered rather than shouted, but all the best bits of SEO are.
Notice the forum name here: Toolbar & Desktop Applications
[webmasterworld.com...]
They're very relevant concepts for static search in non-hyperlinked environments, which is being actively pursued by some major players. For our practical purposes it isn't so much the theoretical dissection as using whatever information is available to know how to apply the concepts to websites. Whether or not LSI is actually actively in place there are still elements that it certainly can't hurt to use.