Welcome to WebmasterWorld Guest from 23.22.46.195

Forum Moderators: bakedjake

Founder of Wikipedia plans search engine to rival Google

Amazon.com is linked with project ...

   
10:50 am on Dec 23, 2006 (gmt 0)

5+ Year Member



The Times, December 23, 2006

Founder of Wikipedia plans search engine to rival Google
James Doran, Tampa, Florida

-Amazon.com is linked with project
-Launch scheduled for early next year

Jimmy Wales, the founder of Wikipedia, the online encyclopaedia, is set to launch an internet search engine with amazon.com that he hopes will become a rival to Google and Yahoo!

..."Essentially, if you consider one of the basic tasks of a search engine, it is to make a decision: 'this page is good, this page sucks'," Mr Wales said. "Computers are notoriously bad at making such judgments, so algorithmic search has to go about it in a roundabout way.

"But we have a really great method for doing that ourselves," he added. "We just look at the page. It usually only takes a second to figure out if the page is good, so the key here is building a community of trust that can do that."

...Catching up with Google, Yahoo!, Microsoft's MSN or even smaller operators such as Ask.com will be a difficult challenge, Mr Wales conceded.

[business.timesonline.co.uk...]

[edited by: tedster at 12:08 pm (utc) on Dec. 23, 2006]
[edit reason] fair use of copyrighted material [/edit]

9:31 pm on Dec 23, 2006 (gmt 0)

10+ Year Member



I remember that thread well BeeDeeDubbleU.

There are problems with search though that won't be solved by a user trust system. A few random points:

1. Most searches are inherently ambiguous in meaning, regardless of semantic analysis. ie. there is no right answer based onthe query itself. If I type in "digital cameras" I could be looking for reviews, the cheapest store, a list of only palm-sized cameras, phone cameras etc. Or "nevade real estate", am I buying, selling etc. A user trust/recommendation system doesn't solve this--people inherently are at least somewhat vague so the idea of getting the "perfect answer" isn't going to happen.

2. Wikipedia has succeded as a user trust system, but it mainly covers top line info, not the long tail. Long tail info is much more difficult and almost infinite, there will never be enough reviews/reviewer, it's simply impossible. And I would estimate at least 75% or more queries on/to my sites are for long tail or more obscure/niche queries.

3. Isn't StumbleUpon a website user recommendation system? I don't use it but I seem to recall that. I guess this is like a stumbleupon search engine--why can't they just do that too?

I think Google will eventually be brought down a notch or two and I like them getting more competition. Most of what they do can be done by other other companies, so they have the constant pressure to stay several steps ahead and not make mistakes. But there is also a diminishing rate of return. Once they make some mistakes they will lose ground.

9:36 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Senior Member billys is a WebmasterWorld Top Contributor of All Time 10+ Year Member



>>say there are 10,000,000,000 web pages

You don't need to index the web to make a good search engine - probably less than 0.1% of that amount is all it would take to answer about 99% of all queries.

9:38 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Senior Member digitalghost is a WebmasterWorld Top Contributor of All Time 10+ Year Member



>>I like the idea of human-rated pages. But let's do some math:
* say there are 10,000,000,000 web pages
* for a page to be rated reliably, at least 5 ratings are required
* a page needs to be rated at least once a year
* one person can rate 100 pages per day, 300 days per year.

Check out the above linked video, from Luis von Ahn, and you can throw out the math. Distributed computing, via brain power, not processing cycles.

9:53 pm on Dec 23, 2006 (gmt 0)

10+ Year Member



>>> You don't need to index the web to make a good search engine - probably less than 0.1% of that amount is all it would take to answer about 99% of all queries.

So, after long complaints about how Google gives "authority" sites the power to rank everywhere - you want to squeeze the web into 0.1% of it's size?

See, people like you make ideas like that doomed - you automatically assume that only 1 page out of 10,000,000 is "worthy".

Dont get me wrong - I dont like Google much, but I would hate to see it replaced by another DMOZ.
And teaming with amazon? Yeah, I can see the first 5 results right now...

11:46 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Senior Member steveb is a WebmasterWorld Top Contributor of All Time 10+ Year Member



"that he hopes will become a rival to Google and Yahoo"

Low aspirations.

12:34 am on Dec 24, 2006 (gmt 0)

5+ Year Member



"Could be a good move - after all Google itself will rank Wiki in the top 5 for any page it puts up - regardless of content. So even Google admits the results are outstanding."

Rankings for wikipedia pages have absolutely nothing to do with content or quality but, rather, wikipedia's internal linking characteristics.

As to this proposed search engine, it's dead-on-arrival. It's best hope ever would be to capture about as much market share as ask.com and even that is completely pie-in-the-sky fantasy.

12:51 am on Dec 24, 2006 (gmt 0)

10+ Year Member



I'm skeptical because a lot of people use google not only for the common searches (paris hilton etc.,), but also for the out-of-the ordinary searches. It can take years to create a thorough, reliable index that is really that in-depth.

Another big aspect of the search war is people. Who will they have that can really create a good search engine from the ground up and go head-to-head with google's troops?

Wikipedia is good and has some amazing concepts at work but can that really be turned into something that'll threaten Google. Google already has the mind-power and cash to fight back.

1:40 am on Dec 24, 2006 (gmt 0)

WebmasterWorld Senior Member billys is a WebmasterWorld Top Contributor of All Time 10+ Year Member



>>So, after long complaints about how Google gives "authority" sites the power to rank everywhere - you want to squeeze the web into 0.1% of it's size?

>>See, people like you make ideas like that doomed - you automatically assume that only 1 page out of 10,000,000 is "worthy".

Just for the record, 0.1% would be 1 page out of 1,000. And yeah, I believe there is enough information in 100 million pages of information to answer 99% of all web searches in an effective manner. In fact, that's probably still way too many pages.

Think about it... Amazon for shopping, Wikipedia for information - it's a pretty powerful combination.

There is also no reason to include all websites in this search engine. What would make anyone think their particular website is worthy? There are very few people out there with totally unique information.

1:51 am on Dec 24, 2006 (gmt 0)

5+ Year Member



"The world of search will soon be controlled by unemployed drunks in underpants."

What's wrong with working in your underpants?

3:36 am on Dec 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>say there are 10,000,000,000 web pages

You don't need to index the web to make a good search engine - probably less than 0.1% of that amount is all it would take to answer about 99% of all queries.

A search on "digital cameras" claims 84,000,000 results.

But try to look at more than 1,000 of those results.

So how many pages do you actually need/use in a SE?

3:53 am on Dec 24, 2006 (gmt 0)

WebmasterWorld Senior Member jtara is a WebmasterWorld Top Contributor of All Time 5+ Year Member



1. Most searches are inherently ambiguous in meaning, regardless of semantic analysis. ie. there is no right answer based onthe query itself. If I type in "digital cameras" I could be looking for reviews, the cheapest store, a list of only palm-sized cameras, phone cameras etc. Or "nevade real estate", am I buying, selling etc.

Searches are inherently ambiguous in meaning because serchers have been trained to remove the meaning from their searches.

Walk up to somebody on the street (or in a store) and say "digital cameras". Do you think you will get a meaningful response from them? Or a blank stare?

OK, in a store, you might get pointed to the right section of the store, and otherwise ignored because it will assumed you are foreigner who speaks almost no English and is going to be difficult to deal with.

Yet, this is how we have been trained to communicate with search engines. Unfortunately, it is now having spillover into the language.

This is not how people communicate. We don't communicate using keywords, because it is ineffective, frustrating, and devoid of nuances of meaning.

Why are we still communicating with computers this way?

Hint: it isn't because of laziness on the part of searchers. It's because they've been taught that this is the way it works. Computers don't understand sentences and paragraphs. They understand keywords. Anybody typing fully-formed sentences and paragraphs into a search box will be ridiculed as a newbie by anybody looking over their shoulder.

5:00 am on Dec 24, 2006 (gmt 0)

10+ Year Member



"Think about it... Amazon for shopping, Wikipedia for information - it's a pretty powerful combination."

That may be true but you can also buy just about anything at a WalMart. Should the rest of all the B & M's just close up shop and go home? Personally, I don't like shopping at WalMart, or Amazon for that matter.

And also slightly OT: If DMOZ has gotten so bad (and I agree it has), why does Google continue to use it for their directory and give heavy weight to the listings there. Why don't they just start their own and charge like Y? That'd be about the easiest billion dollars anyone could ever make.

5:04 am on Dec 24, 2006 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month Best Post Of The Month



"We just look at the page. It usually only takes a second to figure out if the page is good, so the key here is building a community of trust that can do that."

I wonder if Mr. Doran is under the impression that those 10k Googlers are all parsing AdWords ads?

Gosh, how many thousand of them are actually pulling QC on the database?

Lets try some math:

If you have 1000 people making editorial decisions at the rate of 3-4 pages a minute for 400 minutes a day = 1600 pages per person per day - or about 1.5 million pages per day. If you have 5000 people doing that - you have about 7.5million pages per day, or about 150million pages per month.

Strangely enough, I have heard the figure 150 million pages used in reference to the bulk of the long tail in the top two search engines. Meaning that the top 150million pages on the web comprise 95-98% of the search engine listings popping up in search engines on any given day.

That said, I would rather have machine based results. Humans are easy to manipulate (Ever hear of Dmoz? lol).

I wish them luck.

5:17 am on Dec 24, 2006 (gmt 0)

10+ Year Member



I really don't get it. If they want human input into the search results (and they should), why not just give more weight to the Google toolbar which millions of people already have installed? Certainly those 200 phd's they have can come up with a matrix to match user queries with browsing behavior. It would be impossible to manipulate with all the millions of searches out there and they could stop wasting all their time trying to fight spam and doorway pages, scraper sites, MFA sites and all the rest. If they don't than the people at Wiki should. It'd be a lot easier than what they're talking about.
5:20 am on Dec 24, 2006 (gmt 0)

10+ Year Member



First of all I hope this is incentive for Google to rid its serps of Wikipedia junk. Secondly, researching many topics of which I know a great deal about...Wiki content tends to be subjectively biased and totally incorrect. For this reason, on topics of which I know little about...I view Wiki with extreme skepticism.

IMO Wikipedia is useless for the purpose for which it was created...now how in the world does Wales think he can take such a quantum leap to rival Google?

5:53 am on Dec 24, 2006 (gmt 0)

WebmasterWorld Senior Member jtara is a WebmasterWorld Top Contributor of All Time 5+ Year Member



First of all I hope this is incentive for Google to rid its serps of Wikipedia junk

Did you read the article at all?

This is not a Wikipedia project. It has absolutely nothing to do with Wikipedia.

It is being spearheaded by Jimmy Wales, one of the founders of Wikipedia. He recently resigned as Chair of the board of the Wikimedia Foundation, although he continues to serve on the board and hold the honorary title of Chairman Emeritus.

I sense some bitterness here regarding Wikipedia. I think it angers some that so much has been accomplished by a non-commercial effort for the public good, leaving out the potential for profit.

I find it troubling that several here seem to so quickly urge retribution against a competitor, and even against those that don't even have a direct connection with them but are somehow tainted by association. Sounds like something that might happen in Sicily in the previous century.

Wikipedia certainly has it's flaws, but it is a tremendous accomplishment, with a great deal of utility despite it's warts. It is anything but a failure.

I swear we seem to have been invaded by a few Ferengi. Just can't continence the thought of doing something with no profit.

6:53 am on Dec 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"Why are we still communicating with computers this way?"

Because as far as I know it's the only way we can. For the time being we humans have to communicate with computers on the computers terms.

9:00 am on Dec 24, 2006 (gmt 0)

10+ Year Member



jtara, actually I disagree that people have been trained to remove meaning. It's normal offline behavior for people to initially express themselves vaguely. Then they get more specific. They search the same way.

For example, I asked my brother what he wanted for his birthday and he said his big gift request was for a "digital camera".

Then I asked him what kind of digital camera and he said an SLR-type. Then I asked Canon or Nikon? Then the price range. etc.

People often think in small logical chunks that then progress into something bigger. That is why menus, categories, store aisles, magazine sections are organized in general basic chunks of meaning that then get more specific within each section.

For the same reason, people prefer to point and click on a GUI several times to achieve a result rather than type a long-winded but more specific command into a command line.

9:25 am on Dec 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Search is Search.....Whether you like Google, Yahoo, MSN, Ask or something else......I don't much care.....it has been solved.....they all produce relevant results today!

The question of which is better is a question of "human choice".....and there will never be a 100% answer to that!

If someone wants to add to the search engine mix....go for it, but, why bother, when there are much bigger things to do?

I see this subject much like a bunch of cavemen arguing about who has the most "round wheel". Who cares....!

In the meantime someone else is going to invent the combustion engine and make you all look like idiots!

10:36 am on Dec 24, 2006 (gmt 0)

WebmasterWorld Senior Member beedeedubbleu is a WebmasterWorld Top Contributor of All Time 10+ Year Member



See, people like you make ideas like that doomed - you automatically assume that only 1 page out of 10,000,000 is "worthy".

He did not say that and anyone with half a brain can claim that there are ten squillion web pages and that it is impossible to do anything with this manually. Anyone with a full brain can see that an increasingly large percentage of these pages are worthless and add nothing to the Net as a whole.

If a system can be developed that can find the real content and show it in its results then the advantages of using such a system will quickly become obvious whether it has ten million pages or ten squillion. As someone already said, hardly anyone goes beyond the top twenty or thirty results anyway so the rest may as well not be there.

Google's Adsense/Adwords system is in itself responsible for a large percentage of the rubbish that is today's Internet. This is where they make almost all of their money so clearly anything that Google does must take this into account. It follows that this may often be at the expense of better results.

1:27 pm on Dec 24, 2006 (gmt 0)

10+ Year Member



In many ways the old about.com structure with payed editors had been a promising approach.

Combining some decent algo search with pro or semi pro editors could make a good structure
for better results.

Of course, Google could do that much faster and better with all their resources than Mr. Wales.
Even Yahoo and MSN would have a chance to improve their results with such a combination.

3:52 pm on Dec 24, 2006 (gmt 0)

10+ Year Member



Just can't continence the thought of doing something with no profit.

Yes I did, did you? I believe the project is ultimately for profit.

Profit does not equal evil. To the contrary, it is the great equalizer. It is the incentive for excellence. Remove the profit motive and you get....hmmmm let's see...something like government bureaucracy.

My point was that human editing will not improve search. Google, Yahoo, etal at least has an algo that is based on a consistent set of objective parameters. It is a given that this is not search utopia without flaws here and there. However, enter the human element and there is inconsistency and subjective bias from one extreme to the other.

4:06 pm on Dec 24, 2006 (gmt 0)

WebmasterWorld Administrator webwork is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Inefficient efficiency - evolution, democracy - may be a good thing. At least I'm prepared to argue the point.

"Woman versus machine" is likely a debate that is best never settled for once and for all. Diversity, for all its flaws, has the benefit of inefficiency.

4:09 pm on Dec 24, 2006 (gmt 0)



In many ways the old about.com structure with payed editors had been a promising approach.

I was an About.com "guide" for 4-1/2 years, and I don't think anyone at About.com--even among the most hyperbolic marketing and PR types--regarded About.com as a substitute for search engines. For a short while, there was an effort to compete with the Yahoo directory (at least within the 500 or so topics covered by About.com at that time), but there's a big difference between a directory and spidered search.

As for the earlier example of "digital cameras," I'd point out that the problem isn't with the search engines, but with the inability or unwillingness of users to define what they're looking for. Whether a user types in a keyphrase ("digital cameras") or a plain-English statement ("I want help in picking out a digital camera"), there's no way that an automated search engine or a human-edited directory can supply a perfect answer. And if the user has the common sense to type in something reasonably precise ("Widgetco WC-1 camera review" or even "Widgetco WC-1 camera"), Google will supply remarkably good results most of the time. (I say this as someone who's done a lot of research into digital cameras with the help of Google.)

4:49 pm on Dec 24, 2006 (gmt 0)

10+ Year Member



“Google is very good at many types of search, but in many instances it produces nothing but spam and useless crap...”

of course, no one here would have pages among such spam and useless crap results, would they?
So if Mr Wales indeed comes up with a means of producing better results, will benefit webmasters here working on great content, as well as users.

Whatever: good to see someone with gumption to try an alternative to Google. ("Google is like the borg" quote from another thread here occurring to me just now. Might be over the top, but having google unchallenged isn't great; a novel model rather than just playing catch up should be interesting.)

5:01 pm on Dec 24, 2006 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month Best Post Of The Month



wow, there is clearly a complete disconnect between the probably reality and the imagined reality as to the way the current search engines work.

Probable search reality: Google currently uses several THOUSAND people to editorialize on the current search indexe. eg: there are a massive amounts of hand checked pages.

Webmaster/SEO Belief: most listings that show in Google are 100% algo based. SE's have done a great job at fostering this myth.

Which has lead to some great myths like "over optimization". To me, that was just another term for "it didn't pass a hand check dude".

What Amazon is proposing? Been there - Done that - Google will continue to do it.

5:05 pm on Dec 24, 2006 (gmt 0)

10+ Year Member




So if Mr Wales indeed comes up with a means of producing better results, will benefit webmasters here working on great content, as well as users.

Yes, but how long will your excellent, authoritative webpage languish in some editor's "inbox"?

This project reminds me of a glorified DMOZ project...if it was such a success, why is everybody flocking to Google?

6:18 pm on Dec 24, 2006 (gmt 0)



Probable search reality: Google currently uses several THOUSAND people to editorialize on the current search indexe. eg: there are a massive amounts of hand checked pages.

If that were the case, so what? A few thousand people would be a drop in the bucket compared to the manpower needed for even a reasonably comprehensive "human-based" search engine/directory.

Also, as another member suggested, deciding whether a page sucks (or doesn't) isn't as important as matching it to the right search phrase--or, to look at it the other way around, finding the most relevant pages for a given search.

8:38 pm on Dec 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I hope they succeed. Unfortunatelly, my bet is on the black market sales of inclusion within .. a month.
9:58 pm on Dec 24, 2006 (gmt 0)

WebmasterWorld Senior Member jtara is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Just can't continence the thought of doing something with no profit.

Yes I did, did you? I believe the project is ultimately for profit.

I wasn't referring to this search project - I was referring to Wikipedia. Somehow, this thread turned to Wikipedia-bashing, and suggestions that Google should punish Wikipedia for Mr. Wales' latest venture with which it (Wikipedia) has no involvement.

BTW, here's a note recently added to the top of the Wikia search page. (i.e. the official home page for this project:)

Reporters and bloggers note: Amazon has nothing to do with this project. They are a valued investor in Wikia, but people are really speculating beyond the facts. This has nothing to do with A9, Amazon, etc.

...

Update: The TechCrunch story is also wrong. This project has nothing to do with the screenshot they are running, and this search project has nothing to do with Wikipedia.

"Why are we still communicating with computers this way?"

Because as far as I know it's the only way we can. For the time being we humans have to communicate with computers on the computers terms.

There was research into natural-language processing and semantic analysis by computers when I was a computer science student. That was 30 years ago. At the time, there had been some success. I recall that one of the application areas of greatest interest was in querying text databases. I'd hope that there's been some progress made.

I have to say that the lack of practical progress in this area is as disappointing to me as the fact that we are a people who ONCE went to the moon.

I disagree that people have been trained to remove meaning. It's normal offline behavior for people to initially express themselves vaguely. Then they get more specific. They search the same way.

For example, I asked my brother what he wanted for his birthday and he said his big gift request was for a "digital camera".

Then I asked him what kind of digital camera and he said an SLR-type. Then I asked Canon or Nikon? Then the price range. etc.

Let's agree to disagree, then. I believe that users have been trained into low expectations by current search engine implementations, with queries dumbed-down to the least common denominator. I think search engines don't give users enough credit.

But let's run with your own idea for a minute.

It would be simple for search engines to maintain a context and allow searches to be refined.

Yes, I know you can "search within results" on Google. But it's burried at the bottom of the page. It's not encouraged by the user interface. And the only way to broaden it back out is to use the browser "back" button, which is a problem once you've viewed more than a single page of results. (Have to hit back multiple times, confusing because the back button has two meanings - back a page of results, and then back to previous search).

Ideally, search-refinement should be a dialog, whether in natural language or otherwise. The search engine should suggest and encourage further refinement. Now, imagine the improvement in search quality that would be possible if the last question was "is this what you were looking for?". (Yes, I know Google has experimented with an "exit poll".)

There's little or no support for filtering unwanted results. There's no support for saving preferences. I've had to resort to using an obscure browser setting to add a -inurl: to every single search to filter-out some of the big junk-content sites (nextag, epinions, etc.). This should be a profile setting. Ideally, once should be able to save settings for different search contexts.

Bottom line is that the search engines have implemented only the very bare minimum of search-refining - search within results, and even then haven't made it easy to use.

I see this subject much like a bunch of cavemen arguing about who has the most "round wheel". Who cares....!

I'm suggesting that we need to move on to inventing tires.

Today's search in ineffective. Keyword-flinging is not the future of search. Any company that has based it's future on keyword-flinging and doesn't change, IMO, will in 10 years be just a distant memory.

There are a few companies working on some better ideas for search. I don't see innovation coming out of the major players, though. They seem happy with what they have. Maybe because it suits their business model better than effective search.

This 103 message thread spans 4 pages: 103
 

Featured Threads

Hot Threads This Week

Hot Threads This Month