Founder of Wikipedia plans search engine to rival Google - Alternative Search Engines forum at WebmasterWorld - WebmasterWorld

Forum Moderators: bakedjake

Message Too Old, No Replies

Founder of Wikipedia plans search engine to rival Google

Amazon.com is linked with project ...

1
2
3
4
»

JackR

10:50 am on Dec 23, 2006 (gmt 0)

10+ Year Member

The Times, December 23, 2006
Founder of Wikipedia plans search engine to rival Google
James Doran, Tampa, Florida
-Amazon.com is linked with project
-Launch scheduled for early next year

Jimmy Wales, the founder of Wikipedia, the online encyclopaedia, is set to launch an internet search engine with amazon.com that he hopes will become a rival to Google and Yahoo!
..."Essentially, if you consider one of the basic tasks of a search engine, it is to make a decision: 'this page is good, this page sucks'," Mr Wales said. "Computers are notoriously bad at making such judgments, so algorithmic search has to go about it in a roundabout way.
"But we have a really great method for doing that ourselves," he added. "We just look at the page. It usually only takes a second to figure out if the page is good, so the key here is building a community of trust that can do that."
...Catching up with Google, Yahoo!, Microsoft's MSN or even smaller operators such as Ask.com will be a difficult challenge, Mr Wales conceded.
[business.timesonline.co.uk...]
[edited by: tedster at 12:08 pm (utc) on Dec. 23, 2006]
[edit reason] fair use of copyrighted material [/edit]

blaze

12:37 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

The problem isn't an issue of does this page suck or not.. I suspect that problem, while obviously extremely hard, isn't going to differentiate yourself. That being said, Google would be wise to develop a trusted community in that respect.

It would be very web 2.0 of them.. They could even pay, they have the cash.

Regardless, Jimmy wales needs to worry about the real problem, and that is: is the web page relevant to your search query and does it load up the page extremely fast..

You need to build super computers next to hydro electric dams with software written by 100s of phds to compete in that area, I'm afraid. Web 2.0 communities aren't going to do the trick.

BillyS

12:50 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Could be a good move - after all Google itself will rank Wiki in the top 5 for any page it puts up - regardless of content. So even Google admits the results are outstanding.

Excellent move if you ask me.

maximillianos

1:09 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

What if they harnessed the community of users as the network of computers used for the search power? Kind of like the those research projects you sign up for to donate your idle CPU time.

That would give them potentially infinite resources... and be the ultimate users search engine... =)

Essex_boy

1:22 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Its about time someone stood up and tried to knock googles block off.

It doesnt surprise me that MSN have not been able to do it

RonPK

2:18 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I like the idea of human-rated pages. But let's do some math:
* say there are 10,000,000,000 web pages
* for a page to be rated reliably, at least 5 ratings are required
* a page needs to be rated at least once a year
* one person can rate 100 pages per day, 300 days per year.

That way, Mr. Wales would need a community of 1,666,667 volunteers working for his for-profit search engine. That doesn't sound very realistic to me.

loner

2:32 pm on Dec 23, 2006 (gmt 0)

10+ Year Member

Cool. The world of search will soon be controlled by unemployed drunks in underpants.

trinorthlighting

2:37 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Human algos eh? They act like the spammers will never find ways in their index.

Lets see, put a page full of content, submit it, get approved and in the index, then change the content on the page. Thats what the spammers will do.

What, are they going to check every page every day? Yea, ok I have some land in florida to sell you and a cheap bridge in New York...

Any takers?

pageoneresults

2:39 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

After all Google itself will rank Wiki in the top 5 for any page it puts up - regardless of content.

Let's see how much longer that lasts after this announcement from the Wiki Founder. Or, after the launch of their search engine. ;)

ODP, there's still a chance... ;)

cornwall

2:53 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Mr Wales said. "Computers are notoriously bad at making such judgments, so algorithmic search has to go about it in a roundabout way.
"But we have a really great method for doing that ourselves," he added. "We just look at the page. It usually only takes a second to figure out if the page is good,so the key here is building a community of trust that can do that."

One wonders where he is going to get this community that he can trust.

ODP never managed it, Wikipedia has not managed it.

The need for volunteers will always create communities who are either self interested in promoting their own sites at the expense of competitors or power hungry in advancing themselves in the community.

trinorthlighting

2:58 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Bring humans in the loop, then there is a big chance of corruption..

At least a computer algo scoring does no favoring

pageoneresults

3:06 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

At least a computer algo scoring does no favoring.

Hmmm, I might have to disagree with that.

Bring humans in the loop, then there is a big chance of corruption.

Let's try to put the past behind us. ;)

longen

3:25 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

The problem with good pages in the Index being changed to Spam would be easy to solve. Each time an existing page is amended it goes back in the queue for review. If 5 or more volunteers flag it as spam, having previously being a good page, it gets deleted from the Index for a year.
Persistent spam would automatically get the entire site deleted.
The algo, not the volunteers, would decide the level of penalty.

In addition sites could be reviewed as a pyramid - once a spam page is found on a given level indexing stops for a period - until problems are fixed.

SEOPTI

3:33 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I see a BAN coming for wikipedia domains at Google.

longen

3:39 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

wikipedia is the new DMOZ - and this time editors can log in.

Webwork

4:38 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Administrator

10+ Year Member

Top Contributors Of The Month

Bite the hand that feeds you? Ya, that works.

Let's not forget that it's commercial search that pays the bills, that is, it's income and profits - from advertisers - ultimately it's commercial or business search - that makes any 'search platform' viable.

What happens come the day that the CGM 'producers' - the free labor market - awaken to the idea that their noble efforts have actually been rendered in service of someone else's profits and elaborate lifestyle? Once nobility as motive is eradicated due to avarice of those seeking to exploit someone else's good nature and good will what's left? Ego gratification? Once it is realized that ego doesn't pay the bills for long then what?

Next up: Communal Media, where the producer's of value=content own the company=content, inasmuch as 'the company' and 'the value of the company' IS the content.

Writer's commune? Publisher's commune? Communal media will be the outcome, where collectivization of people with common publishing interests and community ownership of the media make perfect sense, particularly in the age of disintermediation. (In this regard, the move of Adsense towards supporting multiple accounts within a community platform is a step in that direction. A few more tweaks in the model and the "fuel" for the movement will be in the pipeline.)

Publishing platforms are now a dime a dozen so who needs a venture capitalist to suck the profits from one's creativity?

Give it time. Efforts to extract all the golden eggs from the goose invariably kills the goose. CGM is the golden goose. The goose will evolve to survive, and instead of the goose dying it will be the attempts to suck value from the goose that will begin to die.

Geese of the world, hear me! Unite and let the profit suckers suck eggs from their own hind quarters!

[edited by: Webwork at 5:01 pm (utc) on Dec. 23, 2006]

rogerd

4:42 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Administrator

10+ Year Member

To handle the massive number of pages, and allow for the fact that many pages change frequently, any comprehensive search engine has to have an automated powerful indexing algorithm. Having said that, overlaying human quality indicators to improve the SERPs sounds like a good idea. I think Google is more likely to pull this off than Wales.

The hard part is dealing with deliberate manipulation. Wikipedia has seen its share of that, but most of the conflict has been in a small percentage of the topics. When you are talking about a general search engine, there are millions of search terms that have some value, and many millions of pages that might target those terms. The size and dedication of the community needed to do that would be hard to reach.

dauction

5:01 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

a community of trust

BS Capital BS

We have seen how a community of trust works with DMOZ and Wiki...

NO THANKS!

Webwork

5:06 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Administrator

10+ Year Member

Top Contributors Of The Month

Someone please correct my misunderstanding.

Isn't it true that anyone is free to employ or deploy all the content of Wikipedia so long as there is attribution according to 'the rules'?

So, what's to stop anyone else from doing the same thing as the founder plans to do? If that's the case then isn't there a bit of a 'success due to competitive advantage' problem?

You mean the competitive advantage is that all those editors will just tag along once their efforts are commercialized and everything will just run as smoothly as ever? Seems like a sweeping assumption. What if a whole host of editors - seeing the commercial handwriting on the wall - veer off to for their own collective and mutual benefit society?

What am I missing?

[edited by: Webwork at 5:08 pm (utc) on Dec. 23, 2006]

davewray

5:27 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Sure, corruption could enter into the equation of Wale's new plans for a user-edited search engine. However, there are ways around this. 10 billion pages? I highly doubt there are that many legitimate, indexable pages out there. What one could do is once a human reviews the page and gives it a "yes", or "no", that page could be saved so that the "search engine" knows exactly what the user approved. Every couple of weeks a bot could be sent out to scan all pages approved and check for any significant modifications. If, let's say, over 30% of the page's content has been changed, a human would have the opportunity to re-check that page to make sure it hasn't been changed for the worse. That's a rudimentary way to combat it...far from perfect I know, but still a way around the corruption.

I think we can all agree that SE's will Always be worked over by certain individuals to their advantage. They key for SE's is to limit that corruption of listings. Google can't do it 100% right, so why should we expect this new "Amazon" engine to do it? Any and all competition to Google would be good. Good competition is good for users and advertisers alike (possibly not good for the SE's though).

Cheers,
Dave.

ecommerceprofit

5:35 pm on Dec 23, 2006 (gmt 0)

10+ Year Member

Top Contributors Of The Month

interesting - I think it could work with Amazon's help

farmboy

5:49 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

If, let's say, over 30% of the page's content has been changed, a human would have the opportunity to re-check that page to make sure it hasn't been changed for the worse.

So newspapers, magazines, online newsletters, blogs, jobs sites, classified ads sites, personals sites, etc. would all need ongoing reviews.

FarmBoy

vik_c

6:09 pm on Dec 23, 2006 (gmt 0)

10+ Year Member

He can just slap some AdSense code on every Wikipedia page and retire.

BeeDeeDubbleU

6:19 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

We proposed something like this back in January. I reckon that's where they got the idea. ;)

[webmasterworld.com...]

davewray

6:23 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Farmboy....I didn't mean site-wide, I meant on a page-by-page basis. Trusted websites could be excluded from this algo rule. Many news stories stay on the same page, or archived page. But like I said, a CNN or any other news agency website could be exempt and just indexed without review.

jomaxx

6:30 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

More power to them if it works. And not every page would necessarily need to be ranked; a domain could build up "trust" the same way it appears to in Google.

But ranking a pages quality is only one part of a search engine's job, and not the most important part. I'll take relevance to my specific search phrase over quality any day.

[edited by: jomaxx at 6:32 pm (utc) on Dec. 23, 2006]

heisje

6:30 pm on Dec 23, 2006 (gmt 0)

10+ Year Member

Top Contributors Of The Month

.

I was with DMOZ when it still had a mere 200 editors (and at the time still known as newhoo.com) - I have seen its erratic evolution throughout the years - and can assure you that anything based on such a premise is doomed to fail, by definition - because the immense size of the web is beyond any kind of human handling.

in view of past experience (remember also infoseek zealots?), it is beyond belief that anybody in his right senses may visualize human intervention in search on a great scale.

heisje

.

jtara

6:52 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

You need to build super computers next to hydro electric dams with software written by 100s of phds to compete in that area

Or so Google would like us to think. IMO, Google hasn't used those PhDs very effectively. Nor have they used much of their talent pool very effectively - most people at Google are over-qualified for the work they are doing.

They need to have them working on semantic analysis, which is the only way to move search forward from here. Instead, they are trying to wring every last drop out of the tired concept of link analysis, and have the masses cowed into so thinking that keywords are the best we are ever going to have that it's starting to affect language, as we start to lose conjunctions - and, unfortunately, meaning.

Lets see, put a page full of content, submit it, get approved and in the index, then change the content on the page.

You simply re-visit, and queue for re-review if there have been significant changes in the page. Do this enough times, with a clear pattern of abuse, and the site gets banned.

But I'm wondering why one would do this in the first place. If one HAD good content in the first place, why on earth would you replace it with bad content?

They key for SE's is to limit that corruption of listings. Google can't do it 100% right,

Or doesn't want to.

-----
There are three huge problems with search as it exists today:

(1) Relevancy of results is really very poor.

(2) There is little or no understanding by the search engine of the semantics of either the search or of web pages. It's amazing how well search does work, considering the the search engine doesn't know either what the user is searching for or what the web pages are about. Keywords are a nice parlor trick, but it's time to move on to the real deal.

(3) Search engines need to evaluate trust, legitimacy, viewpoint, motive, etc. etc. and match those with the needs of the searcher. We've progressed very little along this line, with the sole move forward being link analysis.

The first two of these I think can be eventually handled completely by computer, most likely with human "training" involved. The third almost certainly requires much more human involvement and probably the invovement of the public at large. Fan or not, you have to admit that Wikipedia is the biggest and most successful project to date along these lines.

I wish somebody would seriously take on the first two challenges, but I applaud Mr. Wales for taking on the third. It's certainly the one he's most qualified to tackle, and I wish him the success he has enjoyed with Wikipedia.

digitalghost

9:06 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

According to Luis Von Ahn, all they need to do is Use Humans Cleverly [video.google.com]. VIDEO FILE. Fascinating stuff.

the_nerd

9:30 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I like the idea of human-rated pages. But let's do some math:
* say there are 10,000,000,000 web pages
* for a page to be rated reliably, at least 5 ratings are required
* a page needs to be rated at least once a year
* one person can rate 100 pages per day, 300 days per year.

he doesn't have to rate pages - he can rate sites. And then maybe 2000 people could do the trick.

But: there's money in the game, so there will be many woolves out there pretending to be goats - and make sure their sites don't "suck".

This 103 message thread spans 4 pages: 103

1
2
3
4
»