Founder of Wikipedia plans search engine to rival Google

Forum Moderators: bakedjake

Message Too Old, No Replies

Founder of Wikipedia plans search engine to rival Google

Amazon.com is linked with project ...

JackR

10:50 am on Dec 23, 2006 (gmt 0)

The Times, December 23, 2006
Founder of Wikipedia plans search engine to rival Google
James Doran, Tampa, Florida
-Amazon.com is linked with project
-Launch scheduled for early next year

Jimmy Wales, the founder of Wikipedia, the online encyclopaedia, is set to launch an internet search engine with amazon.com that he hopes will become a rival to Google and Yahoo!
..."Essentially, if you consider one of the basic tasks of a search engine, it is to make a decision: 'this page is good, this page sucks'," Mr Wales said. "Computers are notoriously bad at making such judgments, so algorithmic search has to go about it in a roundabout way.
"But we have a really great method for doing that ourselves," he added. "We just look at the page. It usually only takes a second to figure out if the page is good, so the key here is building a community of trust that can do that."
...Catching up with Google, Yahoo!, Microsoft's MSN or even smaller operators such as Ask.com will be a difficult challenge, Mr Wales conceded.
[business.timesonline.co.uk...]
[edited by: tedster at 12:08 pm (utc) on Dec. 23, 2006]
[edit reason] fair use of copyrighted material [/edit]

oldpro

11:05 pm on Dec 24, 2006 (gmt 0)

Somehow, this thread turned to Wikipedia-bashing,

No it has not...Wikipedia and this "wiki search project" are intertwined in that it is based on the same model...human editing.

It is about a machine edited SE versus a human edited SE...which is the essense of the article.

Following your logic...one could say some have turned this into a Google bashing thread.

FromRocky

12:27 am on Dec 25, 2006 (gmt 0)

Look at the name: Google. What does it mean?
The volume of data is exponential growth. It's changing by microsecond. You may ask how much of these are good, but we have go through all before we know.

Sorry, we can't go back, period.

jtara

3:04 am on Dec 25, 2006 (gmt 0)

Here's a 5-minute change that Google could implement with no rocket-science technology, and would make a huge improvement to average search results: simply move "search within results" from the bottom of the page to directly adjacent to the search box, and put it in normal-sized type.

Why won't Google do this?

That 5-minute change would have a huge negative impact on their revenues. I believe that users often click on ads because they are frustrated with SERP relevancy, and advertisers can often provide more relevant results.

There is an inherent conflict of interest in supporting search with context-related advertising. There is a disincentive to make the search ever more effective.

The real breakthrough in search will come when somebody invents an alternative revenue model for search that does not have this inherent conflict of interest.

Once this happens, existing technology will be applied to make search more effective.

It will be interesting to see what Mr. Wales has up his sleeve in this regard. Nothing has been said so far about the revenue model.

I, for one, would be willing to pay for search, if it provided truly relevant results.

BTW, Ask.com's AskX trial is looking pretty impressive to me. In particular, they have addressed the issue of search refining. Search results come back with a sidebar that allows you to either narrow or expand the search, with a number of suggested categories for narrowing or expansion.

As an example, search for "digital camera". The sidebar then suggests that you can narrow to "digital camera reviews", "best digital camera", "digital camera ratings", "digital camera compaison", or "more". "more" pops-up a list including specific bands, "history of digital cameras" "how does a digital camera work", "best prices digital camera", etc.

Still the revenue model is going to continue to be a problem. In the absence of an alternative, we are going to be limited by a very slow one-upmanship between a limited field of competitors. Given Google's market share, they may not even feel they have to respond to ask.com's innovation. If they implemented ask.com's user interface, their revenues would likely drop like a rock.

fearlessrick

3:26 am on Dec 25, 2006 (gmt 0)

The real breakthrough in search will come when somebody invents an alternative revenue model for search that does not have this inherent conflict of interest.

I'll go along with jtara on the above point, but I would not pay for search results (isn't that just another revenue model anyway?)

Let's take it a step further. How about something beyond search. Some form of search has been a de facto standard since the days of Mosaic. It's been refined, expanded, exploited, monetized, criticized, spliced, diced and debated. But we all still search... and some of us complain. (BTW: I've also heard good things about Ask recently)

Is there a better way? Probably, but nobody's thought of it yet. Search isn't fully evolved either, but that's no reason to stop thinking about other ways to analyze, sort, organize, relate and deliver data.

I've always (well, for the last 10 years or so) thought that search, be it a human index or computer algo, was somewhat sloppy. I think about it maybe twice a week but haven't come up with a workable replacement. I have no doubt that somebody will, some day.

Maybe this Jimmy Wales is on to something, maybe not, but I do think we should applaud his initiative and see what he and his team comes up with. Another search engine would be, well, another search engine. Here's hoping they think outside the box.

Also, webwork's call to arms on page 1 of this thread is spot on. The people who create the pages should benefit, not some mega-corporation. But that's another story.

Merry Christmas.

jomaxx

3:33 am on Dec 25, 2006 (gmt 0)

Not everybody likes search-within-results types of interfaces, because there's a kind of fuzzy logic to searching that isn't well served by drilling down and drilling down and drilling down. Not everyone likes using natural language queries, because it's asinine.

Lots of search engines have played with providing lots of different ways of interfacing with millions of search results, and so far Google's has proven to be the most popular by far. That speaks for itself.

oldpro

4:25 am on Dec 25, 2006 (gmt 0)

oh...it's us that pay to play that are causing all the problems. now i get the point.

jtara

5:06 am on Dec 25, 2006 (gmt 0)

oh...it's us that pay to play that are causing all the problems. now i get the point.

Exactly.

It's in the interest of both the search engines and their paying customers (advertisers) to return poor search results.

As for Google's dominance, I think it has more to do with clever branding and being in the right place at the right time than innovation and results.

The public is convinced that Google has cornered the market on high-powered eggheads and that the dismal results they provide are the best we can expect.

The truth is, they have the high-powered eggheads mopping the floors. (An obvious exageration... but as an Adwords customer who's had some contact with their employees, I am apalled at how they hamstring the incredibly talented people they have hired.)

oldpro

11:17 am on Dec 25, 2006 (gmt 0)

jtara,

with all due respect...reductio ad absurdum. google that.

you are totally wrong. getting this thread back to the subject of this thread, this new project is for profit...it will serve up some kind of advertising for revenue...it will be totally human edited. therefore, its serps will be manipulated to an even greater degree to benefit the profit motive than machine generated serps.

the problem is not those of us that fork over big bucks to make even bigger bucks. the problem is the lack of sophistication of the general searching public. for instance, by defining exactly what i am searching for...i get exactly what i need without any ads at all (if i am looking for information). unfortunately, most searchers type in "red widgets" when they are actually looking for "manufacturing re-calls of red widgets".

the ideal you are hoping for will not be solved with a new SE, but with a yet to be invented computer with a USB connection attached to a port surgically implanted in peoples brains. then search engines will be able to read minds and serve up utopian serps and we will all live happily ever after.

petra

4:58 pm on Dec 25, 2006 (gmt 0)

I agree with an earlier post, if anyone could pull this off, Larry and Serge can, and if its not doable then they've already figured this out and thats why they've not done it!

Come to think of it, I think they already employ human rating, (the simley/sad face on your google toolbar...)

Web2.0 ideas are great but not enough for an efficient search engine!

jtara

6:24 pm on Dec 25, 2006 (gmt 0)

To take this in a bit of a different direction - Wales does mention in his blog the open-source software that the site will be built on - Apache Nutch and Lucene.

Nutch is a text search-engine library. Lucene uses Nutch to build a web crawler and search engine.

If nothing else, this project should insure good support and further development on these open-source projects.

Webmasters have few good options for search on their own sites. This should help insure that the Nutch/Lucene option is a viable one.

It should also foster the development of more competition in search. If Wales proves that Nutch/Lucene can be used for a full-scale web search engine, it should encourage others to give it a go with their own unique twist on search.

obono

7:10 pm on Dec 25, 2006 (gmt 0)

> Probable search reality: Google currently uses several THOUSAND people to editorialize on the current search indexe. eg: there are a massive amounts of hand checked pages.

<speculation> And every time a page of your site hits a new #1 placement for a competitive term (2 or 3 words) you get that email or call with an inquire related to *that* page -as from a fake user-. If you answer what is expected you keep the #1 spot, otherwise #2. All within hours.</speculation>

Must be the reason for so many white_hat sites being thrown out and having a hard time making it back to the index. Guidelines aside, those sites may have failed to impress the QR.

mattg3

1:15 am on Dec 26, 2006 (gmt 0)

There is already something like this idea on wikiclone. Nutch and so on. Uses Wikilinks from a dump to index and let's you review these pages in a wiki to be then discussed for inclusion or exclusion.

Timaay

6:26 am on Dec 26, 2006 (gmt 0)

is it just me... or is Mr. Wales completely off-base with what he thinks is 'relevant'?

He says that Google returns 'spam' and 'junk' for the search term, "<s nop>"?!

Try it... looks like relevant results to me?!

If this is what he's basing his new business venture on I'd hate to be one of his financial backers...

Google rules.

[edited by: Brett_Tabke at 2:21 pm (utc) on Dec. 26, 2006]
[edit reason] no keywords please [/edit]

jomaxx

6:42 am on Dec 26, 2006 (gmt 0)

Not to say this necessarily happened, but if someone started talking up that specific search as an example of why a new competitor in the SE space is needed, you couldn't blame Google if they did a little hand editing to make sure their results were spam-free.

Timaay

7:00 am on Dec 26, 2006 (gmt 0)

i see what you're saying... but then if they could react that quickly with 'human edited' search results based on a news story then;

that would also defeat the point of Wales' 'new approach'

Adam_Lasnik

7:57 am on Dec 26, 2006 (gmt 0)

On the first day of Christmas, jtara said to thee:
You've hobbled search results advertisedly!
[...]

I've said it in other threads, but apparently it bears repeating. Our department (Search Quality) is judged upon how much we reduce spam in the index AND (as part of that) how much we improve user happiness (Are users finding what they want and need? Is the experience pleasant?)

By your logic, jtara, Google should instead dramatically increase the number of ads above the fold (instant revenue increase!) when -- in contrast -- at least one of our execs has gone on record stating that we hope to show *fewer* (but better targeted) ads over time. It's an issue of trust and user experience; quality over the long haul.

Ads and search people are in separate buildings, use separate algorithms, report to separate managers (all the way up to the top), and so on. And given that there's little switching cost in search (unlike, say, operating systems), we have to win users' approval every single day. Users notice speed, they notice quality. And, IMHO, to suggest otherwise (users are brandingly lemming'ed sheep) is sadly far from both reality and the holiday spirit ;-)

* * *

Oh, and this wikiasari thing? Sounds fascinating. Jimbo did some great stuff with Wikipedia, and I look forward to seeing what comes of his latest project. As always, there's room on the big ol' interweb for lots of projects to succeed. Personally I think there's way too much zero-sum FUD out there. Balderdash to that!

[Edited for typos and inadvertent redundancy. Bad eggnog, bad eggnog!]

[edited by: Adam_Lasnik at 8:14 am (utc) on Dec. 26, 2006]

docbird

8:00 am on Dec 26, 2006 (gmt 0)

how long will your excellent, authoritative webpage languish in some editor's "inbox"?

- ah, the delights of dmoz.
Though I've had pages languish in G's sandbox before now. [yes, yes, I know, there isn't actually a box filed with sand into which those google boffins place websites and pages. But it sure quacked like a sandbox.]

If some wikipedia style in this new search enging, could do something re getting your own pages in the serps (somewhere).
Of course, some scallywag might then delete them - unless good checks n balances.

Might wind up somewhat like my impression of wikipedia.
Good for pages I check on well accepted aspects of sciences such as widgetology.
Not always so for more contentious issues, such as widget flu.

vivekh

10:03 am on Dec 26, 2006 (gmt 0)

wow finally!

mattg3

2:47 pm on Dec 26, 2006 (gmt 0)

Wikispecies, Wikiclone, Wikibooks and so on. Nearly no one wants to edit on that large scale, besides Wikipedia it seems. Wikispecies is nearly completely useless, Wikibooks is incomplete and so on. And even if something like the wikiclone SE would be pushed by JW, it would end up in the edit wars of the century. Additionally Wikipedia is already a form of commented directory anyway with the weblinks. I'd rather have Google tbh. Glad that Jimmy Crickets mortgage is paid but for the rest of us, maybe not all can be free .. But in a way it's Googles own fault, that they default to Wikipedia on so many searches.

jtara

3:48 pm on Dec 26, 2006 (gmt 0)

Ads and search people are in separate buildings, use separate algorithms, report to separate managers (all the way up to the top), and so on.

Ah, yes, the Chinese Wall.

Google the term, then tell me it has worked well for consumers of the financial industry.

Oh, abuses have been reduced to the level of dull roar, after decades of fine-tuning and ever-increasing government regulation.

For those who don't know what I'm talking about, large financial institutions have faced similar issues of inherent conflict of interest. By their very nature, they engage in financial activities that go against the interest of their customers.

The solution has been a complex set of regulations intending to place a "Chinese Wall" between investment banking and trading operations. There continues to be a problem, as the perfect Chinese Wall still hasn't been invented.

I'm certainly not suggesting that Google is evil. Just that by it's very nature search and advertising are at odds, with the search user being the loser. Market forces - not Google or any other search company - work against search quality as long as the revenue model is contextual advertising.

Do I think that Sergey and Larry sit there discussing how to hobble search even better this week so that more people will click on ads? Probably not. Do I think that financial results by necessity drive a public company, and that the inevitable trickles from the fingers of the accountants to search results? Absolutely.

Yes, quality will be driven by consumer demand. But in a field that for whatever reason has limited competitors at this point, it is a slow process. Progress is limited to what is necessary to please consumers, which, apparently does not require much.

We could draw an analogy to TV. What would TV be like if it were in the interest of advertisers to have cr*ppy TV shows? Oh.... never mind! ;)

antonaf

5:50 pm on Dec 26, 2006 (gmt 0)

Isn't this father vs. son?
Or
Daniel Son vs. Mr. Miaggie?

Wikipedia is popular because of Google. It of course had an online buzz before showing up in Google results, but its strong popularity and overwhelming members didn't come until Google began ranking Wikipeida pages for every keyword known to man.

I think it is a bit of taboo, you have to remember where you came from and pay homage, don't burn your bridges. They should concentrate on working with Google and making Wikipedia more controlled and reliable. Not trying to compete with Google. He should use Google Adsense and monetize from that, instead of trying to take the stage.

jomaxx

8:12 pm on Dec 26, 2006 (gmt 0)

Irrelevant. If Doran can use this model to create a better search engine, or at least a plausible alternative to existing search engines, he should do so. We'd all win if he were to succeed. It's not obvious to me this is the best way to go, but we're in serious need of some diversity in the search space right now.

Even Yahoo and MSN have been converging on Google-like algorithms and are showing remarkably Google-like results now (IMO). In fact it's as if they are gauging the quality of their SERPs by how closely they resemble Google's. As a result, I see enormous opportunity for new search engines with fresh ideas to come in and add value.

unreviewed

10:03 pm on Dec 26, 2006 (gmt 0)

Compared to other search offerings, Google may �deserve� 70% market share. But you and I do not deserve to be collateral damage to a company with that kind of power. Currently, if your �clean� website is accidentally penalized by Google�s war on spam, you are just plain out of luck. With all due respect for Google, competitive balance is needed.

I love Google, I use Google, but they don�t like my site.

Actually the ad guys at Google love me, they think my site is relevant for its keywords. But the search team at Google must fight spam, I realize that and I think they are doing a good job, however, I personally know (as they must) that some sites get hit that should not.

Google can still be an amazing wild success with 40% share. The net needs more search options, and that requires competent people willing to step-up.

Happy New Year! Mr. Jimmy Wales.

Pirates

5:18 am on Dec 27, 2006 (gmt 0)

First thing these guys should do is eliminate googlebot from crawling wikipedia.

sonny

5:28 am on Dec 27, 2006 (gmt 0)

Most important to me, I've snagged a nice domain typo; assuming it's gonna be called Wikisari.

jtara

6:23 am on Dec 27, 2006 (gmt 0)

First thing these guys should do is eliminate googlebot from crawling wikipedia.

Wow, I am just amazed at the amount of sentiment expressed here for "getting even".

Again, this has NOTHING to do with Wikipedia.

Shoot first, ask questions later. Find somebody to blame and make sure you stick it to them. Let no good deed go unpunished. Help stamp-out nonprofit ventures!

Frankly, at this point, Wikipedia doesn't need Google. Like most Wikipedia users, I just go there directly. I do at least as many searchs on Wikipedia as I do on search engines.

IMO, Google features Wikipedia so prominently because doing so enhances Google's credibility considerably. Perhaps by proactive design, perhaps by algorithm, but them's the facts.

I've snagged a nice domain typo; assuming it's gonna be called Wikisari.

WikiAsari. But that's just the name of the project. Wales has already said that that's not going to be the name of the search site. He's been mumm on that.

BeeDeeDubbleU

10:33 am on Dec 27, 2006 (gmt 0)

Ads and search people are in separate buildings, use separate algorithms, report to separate managers (all the way up to the top), and so on. And given that there's little switching cost in search (unlike, say, operating systems), we have to win users' approval every single day.

Adam, you also have to win the stock market's approval every single day.

With respect (Chinese wall or not) you cannot really separate the two forever and when a conflict of interest arises we all know what happens, eh? ;)

Tigrou

2:56 pm on Dec 27, 2006 (gmt 0)

Why trusted community? Outsourcing means you could get Brett's 150million pages reviewed for around $2.5 million. So you need to train some staff, create a hierarchy, etc.. So triple that number, it's still just 7 digits. You need to maintain it for 2 years? Update it? Deal with project creep, etc? Let's multiple the initial number by 10. That's still only $25million. Some VCs won't even look at projects that low...

So really it's not a matter of resources, it's system to handle those results. And does that stand up? Which is what you're all discussing, I guess...

jleane

3:31 pm on Dec 27, 2006 (gmt 0)

Wikipedia is an extremley useful resource. Of course it's subject to bias - but so is just about everything else you'll see online or otherwise.

As for the big G -

Yes they had good timing and great marketing, but the bottom line is that Google got to be #1 because they were able to provide the most relevant results. By listing Wikipedia high in the serps they are simply doing what a search engine is supposed to do - provide links to useful content for its users. The fact that this helped Wikipedia is immaterial. Google didn't make Wikipedia popular, Wikipedia made Wikipedia popular.

Anyway, I agree with jtara - there is a fundamental conflict of interest within the current model.

Simply put, as the quality of organic results goes up, income from PPC will go down. Obviously there are exceptions to the rule but generally speaking the current model doesn't encourage better results.

Logically, the current model encourages finding a point somewhere between the average user being so apalled by the organic results that they would consider switching to a competitor (best PPC but worst user retention) and being so impressed that they would never need to click on a sponsored link to find what they were looking for (worst PPC, best user retention).

As a result I believe the quality of natural se results will progress much slower then it possibly could, especially in areas with valuable keywords.

jtara

4:26 pm on Dec 27, 2006 (gmt 0)

assuming it's gonna be called Wikisari.

It isn't. That's been clarified. That isn't even the project name.

As promised, the project home page, at search.wikia.com has been updated this morning.

Wikiasari?
Wikiasari is and will not be the name for the free search engine we're developing. It was the name of a former project (see history).

Looks like the press erroneously put out the name of an old project that has nothing to do with this, other than that is was a previous search project that Wales was involved in.

"Wikia" is Wales' new company, and wikia.com is home to a collection of Wiki communities, mostly third-party volunteer efforts. I gather that search.wiki.com is the home of the "Jimbo's new search project" community.

Wales also clarifies elsewhere on the new site that the screen shot in the TechCrunch article comes from WikiSearch, a seperate product to be launched January, 2007. WikiaSearch will search among external links from Wikipedia. The bulk of the the profits from WikiaSearch will be donated to Wikipedia.

This 103 message thread spans 4 pages: 103