homepage Welcome to WebmasterWorld Guest from 54.211.235.255
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Alternative Search Engines
Forum Library, Charter, Moderators: bakedjake

Alternative Search Engines Forum

This 103 message thread spans 4 pages: < < 103 ( 1 [2] 3 4 > >     
Founder of Wikipedia plans search engine to rival Google
Amazon.com is linked with project ...
JackR




msg:3198043
 10:50 am on Dec 23, 2006 (gmt 0)

The Times, December 23, 2006

Founder of Wikipedia plans search engine to rival Google
James Doran, Tampa, Florida

-Amazon.com is linked with project
-Launch scheduled for early next year

Jimmy Wales, the founder of Wikipedia, the online encyclopaedia, is set to launch an internet search engine with amazon.com that he hopes will become a rival to Google and Yahoo!

..."Essentially, if you consider one of the basic tasks of a search engine, it is to make a decision: 'this page is good, this page sucks'," Mr Wales said. "Computers are notoriously bad at making such judgments, so algorithmic search has to go about it in a roundabout way.

"But we have a really great method for doing that ourselves," he added. "We just look at the page. It usually only takes a second to figure out if the page is good, so the key here is building a community of trust that can do that."

...Catching up with Google, Yahoo!, Microsoft's MSN or even smaller operators such as Ask.com will be a difficult challenge, Mr Wales conceded.

[business.timesonline.co.uk...]

[edited by: tedster at 12:08 pm (utc) on Dec. 23, 2006]
[edit reason] fair use of copyrighted material [/edit]

 

superpower




msg:3198461
 9:31 pm on Dec 23, 2006 (gmt 0)

I remember that thread well BeeDeeDubbleU.

There are problems with search though that won't be solved by a user trust system. A few random points:

1. Most searches are inherently ambiguous in meaning, regardless of semantic analysis. ie. there is no right answer based onthe query itself. If I type in "digital cameras" I could be looking for reviews, the cheapest store, a list of only palm-sized cameras, phone cameras etc. Or "nevade real estate", am I buying, selling etc. A user trust/recommendation system doesn't solve this--people inherently are at least somewhat vague so the idea of getting the "perfect answer" isn't going to happen.

2. Wikipedia has succeded as a user trust system, but it mainly covers top line info, not the long tail. Long tail info is much more difficult and almost infinite, there will never be enough reviews/reviewer, it's simply impossible. And I would estimate at least 75% or more queries on/to my sites are for long tail or more obscure/niche queries.

3. Isn't StumbleUpon a website user recommendation system? I don't use it but I seem to recall that. I guess this is like a stumbleupon search engine--why can't they just do that too?

I think Google will eventually be brought down a notch or two and I like them getting more competition. Most of what they do can be done by other other companies, so they have the constant pressure to stay several steps ahead and not make mistakes. But there is also a diminishing rate of return. Once they make some mistakes they will lose ground.

BillyS




msg:3198464
 9:36 pm on Dec 23, 2006 (gmt 0)

>>say there are 10,000,000,000 web pages

You don't need to index the web to make a good search engine - probably less than 0.1% of that amount is all it would take to answer about 99% of all queries.

digitalghost




msg:3198466
 9:38 pm on Dec 23, 2006 (gmt 0)

>>I like the idea of human-rated pages. But let's do some math:
* say there are 10,000,000,000 web pages
* for a page to be rated reliably, at least 5 ratings are required
* a page needs to be rated at least once a year
* one person can rate 100 pages per day, 300 days per year.

Check out the above linked video, from Luis von Ahn, and you can throw out the math. Distributed computing, via brain power, not processing cycles.

atlrus




msg:3198472
 9:53 pm on Dec 23, 2006 (gmt 0)

>>> You don't need to index the web to make a good search engine - probably less than 0.1% of that amount is all it would take to answer about 99% of all queries.

So, after long complaints about how Google gives "authority" sites the power to rank everywhere - you want to squeeze the web into 0.1% of it's size?

See, people like you make ideas like that doomed - you automatically assume that only 1 page out of 10,000,000 is "worthy".

Dont get me wrong - I dont like Google much, but I would hate to see it replaced by another DMOZ.
And teaming with amazon? Yeah, I can see the first 5 results right now...

steveb




msg:3198543
 11:46 pm on Dec 23, 2006 (gmt 0)

"that he hopes will become a rival to Google and Yahoo"

Low aspirations.

lfgoal




msg:3198570
 12:34 am on Dec 24, 2006 (gmt 0)

"Could be a good move - after all Google itself will rank Wiki in the top 5 for any page it puts up - regardless of content. So even Google admits the results are outstanding."

Rankings for wikipedia pages have absolutely nothing to do with content or quality but, rather, wikipedia's internal linking characteristics.

As to this proposed search engine, it's dead-on-arrival. It's best hope ever would be to capture about as much market share as ask.com and even that is completely pie-in-the-sky fantasy.

rohitj




msg:3198575
 12:51 am on Dec 24, 2006 (gmt 0)

I'm skeptical because a lot of people use google not only for the common searches (paris hilton etc.,), but also for the out-of-the ordinary searches. It can take years to create a thorough, reliable index that is really that in-depth.

Another big aspect of the search war is people. Who will they have that can really create a good search engine from the ground up and go head-to-head with google's troops?

Wikipedia is good and has some amazing concepts at work but can that really be turned into something that'll threaten Google. Google already has the mind-power and cash to fight back.

BillyS




msg:3198606
 1:40 am on Dec 24, 2006 (gmt 0)

>>So, after long complaints about how Google gives "authority" sites the power to rank everywhere - you want to squeeze the web into 0.1% of it's size?

>>See, people like you make ideas like that doomed - you automatically assume that only 1 page out of 10,000,000 is "worthy".

Just for the record, 0.1% would be 1 page out of 1,000. And yeah, I believe there is enough information in 100 million pages of information to answer 99% of all web searches in an effective manner. In fact, that's probably still way too many pages.

Think about it... Amazon for shopping, Wikipedia for information - it's a pretty powerful combination.

There is also no reason to include all websites in this search engine. What would make anyone think their particular website is worthy? There are very few people out there with totally unique information.

Altstatten




msg:3198607
 1:51 am on Dec 24, 2006 (gmt 0)

"The world of search will soon be controlled by unemployed drunks in underpants."

What's wrong with working in your underpants?

old_expat




msg:3198647
 3:36 am on Dec 24, 2006 (gmt 0)

>>say there are 10,000,000,000 web pages

You don't need to index the web to make a good search engine - probably less than 0.1% of that amount is all it would take to answer about 99% of all queries.

A search on "digital cameras" claims 84,000,000 results.

But try to look at more than 1,000 of those results.

So how many pages do you actually need/use in a SE?

jtara




msg:3198653
 3:53 am on Dec 24, 2006 (gmt 0)

1. Most searches are inherently ambiguous in meaning, regardless of semantic analysis. ie. there is no right answer based onthe query itself. If I type in "digital cameras" I could be looking for reviews, the cheapest store, a list of only palm-sized cameras, phone cameras etc. Or "nevade real estate", am I buying, selling etc.

Searches are inherently ambiguous in meaning because serchers have been trained to remove the meaning from their searches.

Walk up to somebody on the street (or in a store) and say "digital cameras". Do you think you will get a meaningful response from them? Or a blank stare?

OK, in a store, you might get pointed to the right section of the store, and otherwise ignored because it will assumed you are foreigner who speaks almost no English and is going to be difficult to deal with.

Yet, this is how we have been trained to communicate with search engines. Unfortunately, it is now having spillover into the language.

This is not how people communicate. We don't communicate using keywords, because it is ineffective, frustrating, and devoid of nuances of meaning.

Why are we still communicating with computers this way?

Hint: it isn't because of laziness on the part of searchers. It's because they've been taught that this is the way it works. Computers don't understand sentences and paragraphs. They understand keywords. Anybody typing fully-formed sentences and paragraphs into a search box will be ridiculed as a newbie by anybody looking over their shoulder.

fjpapaleo




msg:3198667
 5:00 am on Dec 24, 2006 (gmt 0)

"Think about it... Amazon for shopping, Wikipedia for information - it's a pretty powerful combination."

That may be true but you can also buy just about anything at a WalMart. Should the rest of all the B & M's just close up shop and go home? Personally, I don't like shopping at WalMart, or Amazon for that matter.

And also slightly OT: If DMOZ has gotten so bad (and I agree it has), why does Google continue to use it for their directory and give heavy weight to the listings there. Why don't they just start their own and charge like Y? That'd be about the easiest billion dollars anyone could ever make.

Brett_Tabke




msg:3198669
 5:04 am on Dec 24, 2006 (gmt 0)

"We just look at the page. It usually only takes a second to figure out if the page is good, so the key here is building a community of trust that can do that."

I wonder if Mr. Doran is under the impression that those 10k Googlers are all parsing AdWords ads?

Gosh, how many thousand of them are actually pulling QC on the database?

Lets try some math:

If you have 1000 people making editorial decisions at the rate of 3-4 pages a minute for 400 minutes a day = 1600 pages per person per day - or about 1.5 million pages per day. If you have 5000 people doing that - you have about 7.5million pages per day, or about 150million pages per month.

Strangely enough, I have heard the figure 150 million pages used in reference to the bulk of the long tail in the top two search engines. Meaning that the top 150million pages on the web comprise 95-98% of the search engine listings popping up in search engines on any given day.

That said, I would rather have machine based results. Humans are easy to manipulate (Ever hear of Dmoz? lol).

I wish them luck.

fjpapaleo




msg:3198676
 5:17 am on Dec 24, 2006 (gmt 0)

I really don't get it. If they want human input into the search results (and they should), why not just give more weight to the Google toolbar which millions of people already have installed? Certainly those 200 phd's they have can come up with a matrix to match user queries with browsing behavior. It would be impossible to manipulate with all the millions of searches out there and they could stop wasting all their time trying to fight spam and doorway pages, scraper sites, MFA sites and all the rest. If they don't than the people at Wiki should. It'd be a lot easier than what they're talking about.

oldpro




msg:3198678
 5:20 am on Dec 24, 2006 (gmt 0)

First of all I hope this is incentive for Google to rid its serps of Wikipedia junk. Secondly, researching many topics of which I know a great deal about...Wiki content tends to be subjectively biased and totally incorrect. For this reason, on topics of which I know little about...I view Wiki with extreme skepticism.

IMO Wikipedia is useless for the purpose for which it was created...now how in the world does Wales think he can take such a quantum leap to rival Google?

jtara




msg:3198693
 5:53 am on Dec 24, 2006 (gmt 0)

First of all I hope this is incentive for Google to rid its serps of Wikipedia junk

Did you read the article at all?

This is not a Wikipedia project. It has absolutely nothing to do with Wikipedia.

It is being spearheaded by Jimmy Wales, one of the founders of Wikipedia. He recently resigned as Chair of the board of the Wikimedia Foundation, although he continues to serve on the board and hold the honorary title of Chairman Emeritus.

I sense some bitterness here regarding Wikipedia. I think it angers some that so much has been accomplished by a non-commercial effort for the public good, leaving out the potential for profit.

I find it troubling that several here seem to so quickly urge retribution against a competitor, and even against those that don't even have a direct connection with them but are somehow tainted by association. Sounds like something that might happen in Sicily in the previous century.

Wikipedia certainly has it's flaws, but it is a tremendous accomplishment, with a great deal of utility despite it's warts. It is anything but a failure.

I swear we seem to have been invaded by a few Ferengi. Just can't continence the thought of doing something with no profit.

jchampliaud




msg:3198708
 6:53 am on Dec 24, 2006 (gmt 0)

"Why are we still communicating with computers this way?"

Because as far as I know it's the only way we can. For the time being we humans have to communicate with computers on the computers terms.

superpower




msg:3198729
 9:00 am on Dec 24, 2006 (gmt 0)

jtara, actually I disagree that people have been trained to remove meaning. It's normal offline behavior for people to initially express themselves vaguely. Then they get more specific. They search the same way.

For example, I asked my brother what he wanted for his birthday and he said his big gift request was for a "digital camera".

Then I asked him what kind of digital camera and he said an SLR-type. Then I asked Canon or Nikon? Then the price range. etc.

People often think in small logical chunks that then progress into something bigger. That is why menus, categories, store aisles, magazine sections are organized in general basic chunks of meaning that then get more specific within each section.

For the same reason, people prefer to point and click on a GUI several times to achieve a result rather than type a long-winded but more specific command into a command line.

percentages




msg:3198734
 9:25 am on Dec 24, 2006 (gmt 0)

Search is Search.....Whether you like Google, Yahoo, MSN, Ask or something else......I don't much care.....it has been solved.....they all produce relevant results today!

The question of which is better is a question of "human choice".....and there will never be a 100% answer to that!

If someone wants to add to the search engine mix....go for it, but, why bother, when there are much bigger things to do?

I see this subject much like a bunch of cavemen arguing about who has the most "round wheel". Who cares....!

In the meantime someone else is going to invent the combustion engine and make you all look like idiots!

BeeDeeDubbleU




msg:3198753
 10:36 am on Dec 24, 2006 (gmt 0)

See, people like you make ideas like that doomed - you automatically assume that only 1 page out of 10,000,000 is "worthy".

He did not say that and anyone with half a brain can claim that there are ten squillion web pages and that it is impossible to do anything with this manually. Anyone with a full brain can see that an increasingly large percentage of these pages are worthless and add nothing to the Net as a whole.

If a system can be developed that can find the real content and show it in its results then the advantages of using such a system will quickly become obvious whether it has ten million pages or ten squillion. As someone already said, hardly anyone goes beyond the top twenty or thirty results anyway so the rest may as well not be there.

Google's Adsense/Adwords system is in itself responsible for a large percentage of the rubbish that is today's Internet. This is where they make almost all of their money so clearly anything that Google does must take this into account. It follows that this may often be at the expense of better results.

night707




msg:3198819
 1:27 pm on Dec 24, 2006 (gmt 0)

In many ways the old about.com structure with payed editors had been a promising approach.

Combining some decent algo search with pro or semi pro editors could make a good structure
for better results.

Of course, Google could do that much faster and better with all their resources than Mr. Wales.
Even Yahoo and MSN would have a chance to improve their results with such a combination.

oldpro




msg:3198898
 3:52 pm on Dec 24, 2006 (gmt 0)

Just can't continence the thought of doing something with no profit.

Yes I did, did you? I believe the project is ultimately for profit.

Profit does not equal evil. To the contrary, it is the great equalizer. It is the incentive for excellence. Remove the profit motive and you get....hmmmm let's see...something like government bureaucracy.

My point was that human editing will not improve search. Google, Yahoo, etal at least has an algo that is based on a consistent set of objective parameters. It is a given that this is not search utopia without flaws here and there. However, enter the human element and there is inconsistency and subjective bias from one extreme to the other.

Webwork




msg:3198911
 4:06 pm on Dec 24, 2006 (gmt 0)

Inefficient efficiency - evolution, democracy - may be a good thing. At least I'm prepared to argue the point.

"Woman versus machine" is likely a debate that is best never settled for once and for all. Diversity, for all its flaws, has the benefit of inefficiency.

europeforvisitors




msg:3198913
 4:09 pm on Dec 24, 2006 (gmt 0)

In many ways the old about.com structure with payed editors had been a promising approach.

I was an About.com "guide" for 4-1/2 years, and I don't think anyone at About.com--even among the most hyperbolic marketing and PR types--regarded About.com as a substitute for search engines. For a short while, there was an effort to compete with the Yahoo directory (at least within the 500 or so topics covered by About.com at that time), but there's a big difference between a directory and spidered search.

As for the earlier example of "digital cameras," I'd point out that the problem isn't with the search engines, but with the inability or unwillingness of users to define what they're looking for. Whether a user types in a keyphrase ("digital cameras") or a plain-English statement ("I want help in picking out a digital camera"), there's no way that an automated search engine or a human-edited directory can supply a perfect answer. And if the user has the common sense to type in something reasonably precise ("Widgetco WC-1 camera review" or even "Widgetco WC-1 camera"), Google will supply remarkably good results most of the time. (I say this as someone who's done a lot of research into digital cameras with the help of Google.)

docbird




msg:3198928
 4:49 pm on Dec 24, 2006 (gmt 0)

“Google is very good at many types of search, but in many instances it produces nothing but spam and useless crap...”

of course, no one here would have pages among such spam and useless crap results, would they?
So if Mr Wales indeed comes up with a means of producing better results, will benefit webmasters here working on great content, as well as users.

Whatever: good to see someone with gumption to try an alternative to Google. ("Google is like the borg" quote from another thread here occurring to me just now. Might be over the top, but having google unchallenged isn't great; a novel model rather than just playing catch up should be interesting.)

Brett_Tabke




msg:3198933
 5:01 pm on Dec 24, 2006 (gmt 0)

wow, there is clearly a complete disconnect between the probably reality and the imagined reality as to the way the current search engines work.

Probable search reality: Google currently uses several THOUSAND people to editorialize on the current search indexe. eg: there are a massive amounts of hand checked pages.

Webmaster/SEO Belief: most listings that show in Google are 100% algo based. SE's have done a great job at fostering this myth.

Which has lead to some great myths like "over optimization". To me, that was just another term for "it didn't pass a hand check dude".

What Amazon is proposing? Been there - Done that - Google will continue to do it.

oldpro




msg:3198934
 5:05 pm on Dec 24, 2006 (gmt 0)


So if Mr Wales indeed comes up with a means of producing better results, will benefit webmasters here working on great content, as well as users.

Yes, but how long will your excellent, authoritative webpage languish in some editor's "inbox"?

This project reminds me of a glorified DMOZ project...if it was such a success, why is everybody flocking to Google?

europeforvisitors




msg:3198959
 6:18 pm on Dec 24, 2006 (gmt 0)

Probable search reality: Google currently uses several THOUSAND people to editorialize on the current search indexe. eg: there are a massive amounts of hand checked pages.

If that were the case, so what? A few thousand people would be a drop in the bucket compared to the manpower needed for even a reasonably comprehensive "human-based" search engine/directory.

Also, as another member suggested, deciding whether a page sucks (or doesn't) isn't as important as matching it to the right search phrase--or, to look at it the other way around, finding the most relevant pages for a given search.

Tapolyai




msg:3199042
 8:38 pm on Dec 24, 2006 (gmt 0)

I hope they succeed. Unfortunatelly, my bet is on the black market sales of inclusion within .. a month.

jtara




msg:3199102
 9:58 pm on Dec 24, 2006 (gmt 0)

Just can't continence the thought of doing something with no profit.

Yes I did, did you? I believe the project is ultimately for profit.

I wasn't referring to this search project - I was referring to Wikipedia. Somehow, this thread turned to Wikipedia-bashing, and suggestions that Google should punish Wikipedia for Mr. Wales' latest venture with which it (Wikipedia) has no involvement.

BTW, here's a note recently added to the top of the Wikia search page. (i.e. the official home page for this project:)

Reporters and bloggers note: Amazon has nothing to do with this project. They are a valued investor in Wikia, but people are really speculating beyond the facts. This has nothing to do with A9, Amazon, etc.

...

Update: The TechCrunch story is also wrong. This project has nothing to do with the screenshot they are running, and this search project has nothing to do with Wikipedia.

"Why are we still communicating with computers this way?"

Because as far as I know it's the only way we can. For the time being we humans have to communicate with computers on the computers terms.

There was research into natural-language processing and semantic analysis by computers when I was a computer science student. That was 30 years ago. At the time, there had been some success. I recall that one of the application areas of greatest interest was in querying text databases. I'd hope that there's been some progress made.

I have to say that the lack of practical progress in this area is as disappointing to me as the fact that we are a people who ONCE went to the moon.

I disagree that people have been trained to remove meaning. It's normal offline behavior for people to initially express themselves vaguely. Then they get more specific. They search the same way.

For example, I asked my brother what he wanted for his birthday and he said his big gift request was for a "digital camera".

Then I asked him what kind of digital camera and he said an SLR-type. Then I asked Canon or Nikon? Then the price range. etc.

Let's agree to disagree, then. I believe that users have been trained into low expectations by current search engine implementations, with queries dumbed-down to the least common denominator. I think search engines don't give users enough credit.

But let's run with your own idea for a minute.

It would be simple for search engines to maintain a context and allow searches to be refined.

Yes, I know you can "search within results" on Google. But it's burried at the bottom of the page. It's not encouraged by the user interface. And the only way to broaden it back out is to use the browser "back" button, which is a problem once you've viewed more than a single page of results. (Have to hit back multiple times, confusing because the back button has two meanings - back a page of results, and then back to previous search).

Ideally, search-refinement should be a dialog, whether in natural language or otherwise. The search engine should suggest and encourage further refinement. Now, imagine the improvement in search quality that would be possible if the last question was "is this what you were looking for?". (Yes, I know Google has experimented with an "exit poll".)

There's little or no support for filtering unwanted results. There's no support for saving preferences. I've had to resort to using an obscure browser setting to add a -inurl: to every single search to filter-out some of the big junk-content sites (nextag, epinions, etc.). This should be a profile setting. Ideally, once should be able to save settings for different search contexts.

Bottom line is that the search engines have implemented only the very bare minimum of search-refining - search within results, and even then haven't made it easy to use.

I see this subject much like a bunch of cavemen arguing about who has the most "round wheel". Who cares....!

I'm suggesting that we need to move on to inventing tires.

Today's search in ineffective. Keyword-flinging is not the future of search. Any company that has based it's future on keyword-flinging and doesn't change, IMO, will in 10 years be just a distant memory.

There are a few companies working on some better ideas for search. I don't see innovation coming out of the major players, though. They seem happy with what they have. Maybe because it suits their business model better than effective search.

oldpro




msg:3199123
 11:05 pm on Dec 24, 2006 (gmt 0)

Somehow, this thread turned to Wikipedia-bashing,

No it has not...Wikipedia and this "wiki search project" are intertwined in that it is based on the same model...human editing.

It is about a machine edited SE versus a human edited SE...which is the essense of the article.

Following your logic...one could say some have turned this into a Google bashing thread.

This 103 message thread spans 4 pages: < < 103 ( 1 [2] 3 4 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Alternative Search Engines
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved