homepage Welcome to WebmasterWorld Guest from 23.20.19.131
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Marketing and Biz Dev / SEM Research Topics
Forum Library, Charter, Moderators: phranque

SEM Research Topics Forum

    
Google: Semantic Web Must Overcome Incompetence
engine




msg:3014475
 2:12 pm on Jul 19, 2006 (gmt 0)

Sir Tim Berners-Lee has a vision of a Web where machines as well as people can read content, but Google sees plenty of hurdles.

A Google executive challenged Internet pioneer Tim Berners-Lee on his ideas for a Semantic Web during a conference in Boston on artificial intelligence.
...

At the end of the keynote, however, things took a different turn. Google Director of Search and AAAI Fellow Peter Norvig was the first to the microphone during the Q&A session, and he took the opportunity to raise a few points.

Google: Semantic Web Must Overcome Incompetence [news.zdnet.co.uk]

 

arnarn




msg:3014611
 3:32 pm on Jul 19, 2006 (gmt 0)

politeness trumps arrogance

jtara




msg:3014628
 3:46 pm on Jul 19, 2006 (gmt 0)

Norvig had to back-peddle and clarify that he wasn't referring to Berners-Lee. It's webmasters, he explained, that are incompetent.

Meanwhile, Berners-Lee handled the debate with flawless grace.

I think the notion that a major web initiative brought-about by masses of distributed webmasters working toward a goal - rather than imposed from on high by a monopoly - could succeed scares Google.

Why, those incompetent webmasters will never succeed! Why... they're... IDIOTS! ;)

See also:

[webmasterworld.com...]

wildbest




msg:3014653
 4:08 pm on Jul 19, 2006 (gmt 0)

Why, those incompetent webmasters will never succeed! Why... they're... IDIOTS! ;)

LOL

What about those idiots that succeeded due to Google smart algos?

encyclo




msg:3014659
 4:15 pm on Jul 19, 2006 (gmt 0)

Webmaster incompetence (in a technical sense as mentioned by Mr. Norvig) is only one aspect of the problem facing a semantic web. A bigger problem is with webmaster deception: that is, any mata data contained within a document cannot be relied upon as being descriptive of the document's contents as the publisher of that document may be exaggerating, falsifying or manipulating that metadata.

A perfect example is the very first baby-steps of semantic metadata in document: the meta keywords tag. Google (or any other search engine or classification mechanism) simply cannot rely on this metadata as being useful or descriptive as it is abused far more than it is used correctly.

Google's ranking mechanisms were the first which were the antithesis of the semantic web ideal - discounting heavily the document metadata and even document contents and assigning relevance in relation to third-party data such as inbound links (this is simplifying Google's algo to the extreme, but is basically true).

As it is, it is more often the search engine which provides the semantics via its algo rather than the utopian RDF/metadata approach. This isn't Google being arrogant, Mr. Norvig is simply stating the current state of affairs as seen on the web today.

Useful reading: Metacrap: Putting the torch to seven straw-men of the meta-utopia [well.com] (an old classic from 2001)

LifeinAsia




msg:3014689
 4:41 pm on Jul 19, 2006 (gmt 0)

What about those idiots that succeeded due to Google smart algos?

How about those idiots who suceed IN SPITE OF Google's algos? :)

Metaphorically




msg:3014692
 4:43 pm on Jul 19, 2006 (gmt 0)

Norvig clarified that it was not Berners-Lee or his group that he was referring to as incompetent, but the general user.

There's some great business accumen: call your users incompetent.

Semantic Web does rely on the data provider (webmaster) holding the goal of accurate dissemination of information above all else. Unfortunately the plain fact is that most webmasters are out there to get ahead, just like anyone else. I'm sure Google's very interested in getting things like RDF and semantic web technolgies in more popular use, but there has to be a way of getting a disinterested third party to vouch for the data provider in order for it to work.

Calling people incompetent (or just drawing attention to the fact even if it is true) isn't a great way to get this stuff moving.

encyclo




msg:3014718
 5:03 pm on Jul 19, 2006 (gmt 0)

call your users incompetent

Webmasters are not Google's users, searchers (the general public) are. He's not calling Google's users incompetent, he's saying that there is a serious lack of knowledge within the sys-admin and webmaster communities with regards to server setup issues and HTML etc. in published pages.

It's important not to make assumptions based merely on a point of view of mistrust or bad feelings toward Google as an entity. To address Mr. Norvig's comments, in your opinion is there a lot of "technical incompetence" out there? Aren't many indexing problems (such as canonical issues, to take just one example) not related to misconfigured servers? Are badly-indexed pages a result of poor markup? How many sites are correctly and accurately using metadata?

Webwork




msg:3014789
 5:49 pm on Jul 19, 2006 (gmt 0)

Information economy meets the semantic web. Google's performance mirrors the world.

Information economy: She who controls the information controls the economy - sort of.

There's a reason, when given the chance at PubCon Boston, I asked Mr. Gladwell "How will we tell (know) the truth in the future?"

It's a topic worthy of a book written by the best of minds as the dirty business of information may turn out a generation, not of information consumers, but of information cynics. (I think we're pretty well along.) The drama of the Google / Semantic Web / Business of Information mashup is the tip of an iceberg. What's at stake is not just the efficiency of a search engine under siege of the "information economy" but also world under siege of the same (warring) information economy. (What news channel do you watch and therefore what flavor - I mean mark-up of news/truth do you ingest?)

What's the truth about global warming and what's anyone to do about it? I don't want anyone's answer to the question. What I want is a more vigorous dialogue about how any one person - and a whole world - might overcome its incompetence with information, and better learn to discern that which we might like to call "the truth", and in the absence of any specific right or wrong answer in the search for truth or information, what might be a wise approach for humankind to adopt as prophylactic behavior pending revelation of any particular truth - such as why there is evidence of climate change or "why his website ranks higher than mine".

What I see as an emerging plight of humankind at large is, either analogous or parallel, to the plight that Google faces: "What is the (best) answer to this inquiry"? (Well, maybe that plus a bit of "give me a good question and I'll give you a better answer".)

Clearly none of this "is new". Information war is likely a constant, only it now appears more pervasive, persistent, psychologically perfected, more rapid and more likely to cause harm due to the scale of things in 2006 and beyond.

Sorry for the leap but I think it's pertinent to step away from planet earth to view this issue in context. :)

Mr. Gladwell: If you're out there I again invite you to take on the ever present issue of how, as the world is now configured - in the so-called "information age" - humankind might stand a chance of standing for anything, as if we stand on the firm ground of some knowable and known thing "like truth".

Thanks. I promise I'll buy the book this time. :)

[edited by: Webwork at 6:18 pm (utc) on July 19, 2006]

Hanu




msg:3014848
 6:32 pm on Jul 19, 2006 (gmt 0)

Norvig clarified that it was not Berners-Lee or his group that he was referring to as incompetent, but the general user.

"We deal with millions of Web masters who can't configure a server, can't write HTML. It's hard for them to go to the next step. ...

Please! I suggest that Mr. Norvig wade through a vanilla Apache installation's httpd.conf in order to remove all the stupid and dangerous default settings that are in there.

Besides, what kind of HTML is Mr. Norvig referring to? Is it the one that Berners Lee has been dreaming of since the beginning of the nineties or the murky tag soup that we have to deal with nowadays? I guess Berners Lee left the sinking boat just in time to chase another dream: the semantic web.

The idiocy is a collective one I'm afraid. We're all to blame: standardization commitee members, browser/server implementors and webmasters. We made the web the chaotic place it is today. As such it simply mirrors the world around it. It's time that Norvig faces reality again. As far as Berners Lee is concerned I have given up all hopes. Semantic Web ... yeah, right!

wmuser




msg:3015024
 9:16 pm on Jul 19, 2006 (gmt 0)

Hadnt found anything "search revulutionary" in the article

henry0




msg:3015137
 10:43 pm on Jul 19, 2006 (gmt 0)

A good Reading at W3.org [w3.org]

Oliver Henniges




msg:3015638
 11:07 am on Jul 20, 2006 (gmt 0)

Epimenides, the webmaster, once said: All webmasters are liars.

If I understood Mr. Norvig correctly, all he said was: How can you talk about questions of truth and consistency in semantic analysis, webmasters not even can manage present to parsable syntax if.

He wasn't THAT wrong, he was?

jtara




msg:3015924
 3:25 pm on Jul 20, 2006 (gmt 0)

Mr Norvig's point is moot for a large number of Semantic Web applications. Google is blinded by their focus on search. The Semantic Web isn't primarily about search.

A good example from the blog article cited above: repurposing your bank-statement data - e.g. plug it into a calendar. That would be possible, if there were standards for tagging bank statements.

Your bank has no reason to lie when they tag your bank statement. They have every reason to make their bank statement convenient for you, allowing you to plug it into a calendar, or other software.

MOST data that is behind a log-in/password would fit the same profile. Why lie about data that can't be searched anyway, and/or is personal/specific to a particular user?

Will the Semantic Web be useful for finding the "best" widgets, or comparing prices? Probably not. But don't throw the baby out with the bathwater. There are plenty of other things it can be useful for, and for these applications Norvig's argument is a red herring.

gibbergibber




msg:3016533
 10:15 pm on Jul 20, 2006 (gmt 0)

"Will the Semantic Web be useful for finding the "best" widgets, or comparing prices? Probably not. But don't throw the baby out with the bathwater. There are plenty of other things it can be useful for, and for these applications Norvig's argument is a red herring. "

That's true, but then that's not a "semantic web", that's a "semantic database".

I got the impression that Berners-Lee wanted most of the internet to become machine-readable, but that isn't possible if you just want to rely on trusted sources who have no reason to lie to you.

I think in this instance Google is right and Berners-Lee is wrong: the only way to tell if something is reliable is to see how other reliable sources react to it. The reason for this is simple, contents will never be machine-readable, spammers will always find a way of disguising their websites, so only a human can actually tell what is spam just by looking at the contents.

Berners-Lee's defence, that a semantic web could tell who had written information, doesn't really make sense because he doesn't explain how you verify that authors are who they say they are, it's just as vulnerable to deception as anything else.

It's not that Berners Lee is promoting bad ideas, it's just that they're totally impractical for an open web. On an intranet, yes, but not on the internet as a whole.

thgyspsy




msg:3017423
 3:59 pm on Jul 21, 2006 (gmt 0)

I have to put my vote with Google as well (did I just say that? OMG!). I think, at this point, Berners Lee is looking through Rose tinted glasses. It is a wonderful utopian dream, I just can see ‘the dark side’ making a mess of it.

“they're totally impractical for an open web. On an intranet, yes, but not on the internet as a whole.” Say it quite well. There really isn’t enough consideration being given to those who would circumvent it for their own purposes. Hard to believe, I know, but true (lol)

jtara




msg:3017458
 4:17 pm on Jul 21, 2006 (gmt 0)

Berners-Lee's defence, that a semantic web could tell who had written information, doesn't really make sense because he doesn't explain how you verify that authors are who they say they are, it's just as vulnerable to deception as anything else.

Yes, he does. Digital signatures and trust engines.

jtara




msg:3017572
 5:27 pm on Jul 21, 2006 (gmt 0)

That's true, but then that's not a "semantic web", that's a "semantic database".

No, it's still a semantic web, even if not EVERYTHING on the web is semantically-tagged, and even if EVERYTHING on the web is not trustable.

Your bank account could link to your merchant accounts. Your merchant accounts could link to product and customer-support data. When does the warranty run out on the widget you bought with your credit card on July 10 from Big Box Store? Ah, it's in warranty, I can file a claim by clicking here...

It's not just a single database, but linked databases from diverse sources. It's certainly a "web" even if it encompasses less than everything on "the" web.

It's just silly to say that this is of no use, just because there are people who will lie in their mark-up to game search. 90% of Semantic Web applications don't involve search, IMO. Most of it involves following links between data amongst trusted sources.

Plenty of trusted sources that could add great utility to the web by marking-up their data semantically:

- U.S. Weather Service
- Trademark and Patent Office
- Material Safety Data Sheets (individual manufacturers)
- Drug safety information (pharmaceutical companies)
- Material properties (manufacturers, trusted scientific publishers)
- Installation instructions, product data sheets, etc.
- Real Estate listings (MLS, established, known realtors)
- neighborhood crime statistics (local governments)
- airline, train, bus, etc. schedules (local governments)

I could go on and on. Remove price comparison and product search, and you still have a HUGE amount of utility that isn't there today. Just following stuff around from one trusted source to another.

Think of how you use the web today. Only automate much of the "pulling together" of diverse sources. How much time do you really spend doing searches? Most of the time, you are following links around, and-or typing-in URLs (you see a product at a storefront, you want to get more details from the manufacturer's website), amonst sources that you already trust.

Google's got a hammer called search. To Google, everything is a nail. The Semantic Web isn't a nail, and Google's got nothing to fit it. Therefore, it is impractical and worthless.

MrStitch




msg:3017629
 6:08 pm on Jul 21, 2006 (gmt 0)

Wow... this is like, uh... crazy funny.

This whole thread should be deleted 'cause nobody here caught the underlying tones that REALLY matter to the whole situation.

uhg... i must be having a bad day, cause it seems like everyone on the internet has blinders on.

Come on sheep! This way ------>

hahahah

jtara




msg:3017831
 9:07 pm on Jul 21, 2006 (gmt 0)

nobody here caught the underlying tones that REALLY matter to the whole situation

So, illuminate us.

Oliver Henniges




msg:3018337
 11:00 am on Jul 22, 2006 (gmt 0)

> It is a wonderful utopian dream

Yes, exactly. The origins of this dream date back to Marvin Minsky's prophecies on AI, if I remember correctly. And in contrast to Kennedy's dream (also dreamt in the early sixties) of sending a man to the moon, the rocket-science of consistantly understanding meaning has not even left the ground, because it is tied by a Moebius-ribbon. Even fourty years later I am still waiting for a machine to drive my #*$!in' car, so that I can take my hands off the wheel and continue to hack my keyboard.

Sorry, but I am a bit conservative and thus insist on quite strict definitions of "semantics." Perhaps I might accept visions of a "semiotic" web...

> linked databases from diverse sources.

There are a number of laws over here, which really do forbid that. I have no idea how long these laws will persist, but they will definitely slower progress tremedously.

Any digital signature or other prove of trust will always be a challenge to the hacker community in the first place.

The most fascinating aspect of the web is its "openness."

A funny coincidence that I just finished a little book with essays from Popper. Not "the open society and its enemies", though that would fit quite well. I guess he'd say: there is no absolute truth nor trust nor meaning. All we have is our imperfect theories, hypotheses, databases and search-algos. Let's continue to improve these.

jtara




msg:3018560
 4:46 pm on Jul 22, 2006 (gmt 0)

The origins of this dream date back to Marvin Minsky's prophecies on AI

How is that relevant? The concept doesn't require AI in order to work. Indeed, I've pointed-out elsewhere the failure of AI. So, Berners-Lee envisions AI-driven agents putting-together data from the Semantic Web.

Others of a more practical bent envision Excel putting-together data from the Semantic Web.

the rocket-science of consistantly understanding meaning has not even left the ground, because it is tied by a Moebius-ribbon.

What is rocket-science about "this is the name of a manufacturer", "this the a manufacturer's part number", "this is a price"?

So, Berner's-Lee may have a dream that someday the President can ask an agent "how can I improve relations in the Middle East" and get a useful answer. That that day may be far away or never doesn't detract from immediatly-practical things that can be done without having to envoke rocket science.

Sorry, but I am a bit conservative and thus insist on quite strict definitions of "semantics."

So, pick another name for the current efforts in this direction, and let's move on.

> linked databases from diverse sources.

There are a number of laws over here, which really do forbid that. I have no idea how long these laws will persist, but they will definitely slower progress tremedously.

I don't know of a law that would prevent a consumer from linking data from multiple databases. And I don't know of a law that would prevent publishers from linking publically-available data. There are laws about sharing of data of certain types - such as medical records, credit history, etc. by those who collect such data. I don't see that as being one bit of a problem. In those cases, the publisher will provide the data only to the consumer herself and to those it is authorized to share it with. The consumers electronic agent would then be free to link the data as the consumer wishes.

FWIW, these laws are already widely abused on the Internet. Anybody with a credit card can purchase a wide array of personal data. Oh, so you have to lie and check a little box saying that you obtained the consumer's permission...

It's now Standard Operating Procedure amongst small businesses and some big ones to use these services to check-up on prospective employees - with or WITHOUT their permission. For example, I've been told by the manager of a McDonald's that he checks-up on prospective employees this way. No permission is obtained, it's done on a personal credit card and turned in as an expense. Most likely not McDonald's corporate policy, but a decision of a local franchaise owner.

Given the widespread abuses by companies openly selling this information to anyone willing to pay for it, I doubt that laws about linking databases of private information are likely to give the Semantic Web much pause.

Any digital signature or other prove of trust will always be a challenge to the hacker community in the first place.

As is encryption. As is the transmission of credit-card data over the Internet. Yet, we still have e-commerce. Imagine how foolish you would feel today had you said a few years ago that selling things on the Internet and taking credit card online was impractical, because of hackers.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Marketing and Biz Dev / SEM Research Topics
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved