Forum Moderators: open
[www10.org...]
Paper describes differing approaches to using this theory of the web, and differing mathematical approaches for each. Notice the mathematical notation? Seems pretty western, yes?
They mention "random walks" across hyperlinks in one section heading...these are probablistic models. They do not approximate human reasoning, behavior, or surfing habits. We have this theory, in the land of algorithms, and as long as it is working (more or less) they would like to pursue that theory as far as possible through refinement.
I would expect to see something like this come out at each of these conferences, researchers will be continuously looking at the same shade of grey, and academically thwart each others attempts at calling it "whitish grey" and "darkish grey" and "medium grey". Instead of attempting to find new ways of computing relevance, they will be stuck on this one...so long as all major search engines utilize the same techniques.
There must another way jeremy?
>>>>Seems pretty western
I wouldn't be able to read it if it wasn't :)
>>>>staying clear of fuzzy classification
Is it possible that computers can become similar to the human mind. Do we want them to? What if your computer was mentally-challenged...could you take it back? What if didn't like you? Where does emotion play a part in weighing an equation. Is there an advantage to classification of memories?
Artificial Intelligence seems like a neat field.
Please excuse my ramblings..
Neural networks...probability, fuzzy state machines. Spiders are already artificial intelligence...and a search indexing system is an entire team of AI driven machines working together to accomplish a single goal.
AI doesn't have to have anything to do with the development of the search technology field, though. I have a hunch it will, and given the power and scope of even PC's these days, how much longer will it be before they have much more complex mathematical models to base their indexing patterns off of? Not much.
Given the evolution of math, and something I read about that the speed of light might not even be a universal constant, tell me: is hubs / authorities such a good model that all we need to do is refine it for a few years, before having a "perfect" algorithm for indexing/classifying the web? Honestly, I don't believe so.
The paradigm will shift, again. Paradigm shift I found out was originally a mathematical concept...search engines being based on differing forms of math, I feel it most appropriate. ;)
Then again, what's a young guy with a business / spanish degree know about algorithms and such? It's not something I've studied so if you must, consider my ramblings, and move on away from them :)
<rant>
Before we start, I believe there should be two separate search engine markets. The first targetting sites that sell products and goods online. The search engine should then be something very similar to an interactive yellow pages (localised), with some nice PPC model or whatever. That's not very interesting technologically, but it would work nicely for everyone. The second group of pages indexed by search engines would be those containing actual information, completely unrelated to product sales. Extracting the information from these pages, and deciding if a particular user will find what he wants therein is what all the fuss is about. And that's where Artificial Intelligence comes in.
Ideally Artificial intelligence, and more specifically Natural Language Processing would be able to extract this information, and represent it as meaningful symbols. In a similar fashion, the user's query would be parsed and understood. The set of semantic symbols available for each page would then be matched to the query, and we'd have a perfect search engine, with ranking based on the detail and consistency of the information. ('Ask Jeeves' aspires to this, but doesn't really go very far. They have about 63 template questions, and a database of matching answers.)
The beauty of this approach is that the information is always there, it always means the same no matter how many people read it daily, or link to it. It's just matter of finding it, and understanding it.
But, having taken a NLP course at one of the top CS Universities in Europe (York), I can tell you we're far from capable of doing this.
So, how can we decide what's in a document without understanding what's actually in it? Send in the smarty-pants at Google: "Why not use the people's opinion to decide what's in a page." PageRank, which assumes that the page itself is about something resembling the link content, and the title tag, stores it inside its database. The link itself represents someone showing interest for the page. The more of them, the more people show interest in the page, so the more relevant it must be... in theory ;)
And there's the flaw of current search engines: they assume people know what they are doing! With about 98% of the WWW population having little or no clue what so ever, and 1% of experts always trying to swindle the system, how can it be fully accurate (despite 1% of honest surfers)?
The advanced algorithms discussed in the paper mentioned by toolman are about taking more care to analyse the graph structure, to avoid missing out important sites, and to prevent experts from being able to rig the results. Still, the fundamental flaw remains.
That said there are not many alternatives. All we have at the moment is keyword density analysis, and basic tag parsing. Current search engines are starting to deal with very basic document structure (tehoma using immediate environment to determine link importance).
Google is accurate enough, and if you've developed the skill for searching the web, you'll have no trouble finding what you need. Beginners, however, can't so easily -- major problems being spam, and getting side tracked by advertising.
An engine like 'Ask Jeeves' that actually does understand you and the rest of the WWW remains utopic.
</rant>
I've probably said enough to upset pretty much everyone... if so, terribly sorry. If there's something really wrong with this way of thinking, let me know!
Anyway, if i want my AI site to go live saturday, i'd better stop lurking around here!
Alex