Welcome to WebmasterWorld Guest from 54.147.50.227

Forum Moderators: phranque

Message Too Old, No Replies

Google: Semantic Web Must Overcome Incompetence

     
2:12 pm on Jul 19, 2006 (gmt 0)

Administrator from GB 

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month Best Post Of The Month

joined:May 9, 2000
posts:23279
votes: 360


Sir Tim Berners-Lee has a vision of a Web where machines as well as people can read content, but Google sees plenty of hurdles.

A Google executive challenged Internet pioneer Tim Berners-Lee on his ideas for a Semantic Web during a conference in Boston on artificial intelligence.
...

At the end of the keynote, however, things took a different turn. Google Director of Search and AAAI Fellow Peter Norvig was the first to the microphone during the Q&A session, and he took the opportunity to raise a few points.

Google: Semantic Web Must Overcome Incompetence [news.zdnet.co.uk]

3:32 pm on July 19, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:July 19, 2004
posts:142
votes: 0


politeness trumps arrogance
3:46 pm on July 19, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member jtara is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Nov 26, 2005
posts:3041
votes: 0


Norvig had to back-peddle and clarify that he wasn't referring to Berners-Lee. It's webmasters, he explained, that are incompetent.

Meanwhile, Berners-Lee handled the debate with flawless grace.

I think the notion that a major web initiative brought-about by masses of distributed webmasters working toward a goal - rather than imposed from on high by a monopoly - could succeed scares Google.

Why, those incompetent webmasters will never succeed! Why... they're... IDIOTS! ;)

See also:

[webmasterworld.com...]

4:08 pm on July 19, 2006 (gmt 0)

Preferred Member

10+ Year Member

joined:Jan 19, 2006
posts:508
votes: 0


Why, those incompetent webmasters will never succeed! Why... they're... IDIOTS! ;)

LOL

What about those idiots that succeeded due to Google smart algos?

4:15 pm on July 19, 2006 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member encyclo is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 31, 2003
posts:9068
votes: 4


Webmaster incompetence (in a technical sense as mentioned by Mr. Norvig) is only one aspect of the problem facing a semantic web. A bigger problem is with webmaster deception: that is, any mata data contained within a document cannot be relied upon as being descriptive of the document's contents as the publisher of that document may be exaggerating, falsifying or manipulating that metadata.

A perfect example is the very first baby-steps of semantic metadata in document: the meta keywords tag. Google (or any other search engine or classification mechanism) simply cannot rely on this metadata as being useful or descriptive as it is abused far more than it is used correctly.

Google's ranking mechanisms were the first which were the antithesis of the semantic web ideal - discounting heavily the document metadata and even document contents and assigning relevance in relation to third-party data such as inbound links (this is simplifying Google's algo to the extreme, but is basically true).

As it is, it is more often the search engine which provides the semantics via its algo rather than the utopian RDF/metadata approach. This isn't Google being arrogant, Mr. Norvig is simply stating the current state of affairs as seen on the web today.

Useful reading: Metacrap: Putting the torch to seven straw-men of the meta-utopia [well.com] (an old classic from 2001)

4:41 pm on July 19, 2006 (gmt 0)

Moderator from US 

WebmasterWorld Administrator lifeinasia is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 10, 2005
posts:5628
votes: 48


What about those idiots that succeeded due to Google smart algos?

How about those idiots who suceed IN SPITE OF Google's algos? :)
4:43 pm on July 19, 2006 (gmt 0)

Full Member

10+ Year Member

joined:Dec 22, 2004
posts:211
votes: 0


Norvig clarified that it was not Berners-Lee or his group that he was referring to as incompetent, but the general user.

There's some great business accumen: call your users incompetent.

Semantic Web does rely on the data provider (webmaster) holding the goal of accurate dissemination of information above all else. Unfortunately the plain fact is that most webmasters are out there to get ahead, just like anyone else. I'm sure Google's very interested in getting things like RDF and semantic web technolgies in more popular use, but there has to be a way of getting a disinterested third party to vouch for the data provider in order for it to work.

Calling people incompetent (or just drawing attention to the fact even if it is true) isn't a great way to get this stuff moving.

5:03 pm on July 19, 2006 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member encyclo is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 31, 2003
posts:9068
votes: 4


call your users incompetent

Webmasters are not Google's users, searchers (the general public) are. He's not calling Google's users incompetent, he's saying that there is a serious lack of knowledge within the sys-admin and webmaster communities with regards to server setup issues and HTML etc. in published pages.

It's important not to make assumptions based merely on a point of view of mistrust or bad feelings toward Google as an entity. To address Mr. Norvig's comments, in your opinion is there a lot of "technical incompetence" out there? Aren't many indexing problems (such as canonical issues, to take just one example) not related to misconfigured servers? Are badly-indexed pages a result of poor markup? How many sites are correctly and accurately using metadata?

5:49 pm on July 19, 2006 (gmt 0)

Moderator

WebmasterWorld Administrator webwork is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 2, 2003
posts:7877
votes: 27


Information economy meets the semantic web. Google's performance mirrors the world.

Information economy: She who controls the information controls the economy - sort of.

There's a reason, when given the chance at PubCon Boston, I asked Mr. Gladwell "How will we tell (know) the truth in the future?"

It's a topic worthy of a book written by the best of minds as the dirty business of information may turn out a generation, not of information consumers, but of information cynics. (I think we're pretty well along.) The drama of the Google / Semantic Web / Business of Information mashup is the tip of an iceberg. What's at stake is not just the efficiency of a search engine under siege of the "information economy" but also world under siege of the same (warring) information economy. (What news channel do you watch and therefore what flavor - I mean mark-up of news/truth do you ingest?)

What's the truth about global warming and what's anyone to do about it? I don't want anyone's answer to the question. What I want is a more vigorous dialogue about how any one person - and a whole world - might overcome its incompetence with information, and better learn to discern that which we might like to call "the truth", and in the absence of any specific right or wrong answer in the search for truth or information, what might be a wise approach for humankind to adopt as prophylactic behavior pending revelation of any particular truth - such as why there is evidence of climate change or "why his website ranks higher than mine".

What I see as an emerging plight of humankind at large is, either analogous or parallel, to the plight that Google faces: "What is the (best) answer to this inquiry"? (Well, maybe that plus a bit of "give me a good question and I'll give you a better answer".)

Clearly none of this "is new". Information war is likely a constant, only it now appears more pervasive, persistent, psychologically perfected, more rapid and more likely to cause harm due to the scale of things in 2006 and beyond.

Sorry for the leap but I think it's pertinent to step away from planet earth to view this issue in context. :)

Mr. Gladwell: If you're out there I again invite you to take on the ever present issue of how, as the world is now configured - in the so-called "information age" - humankind might stand a chance of standing for anything, as if we stand on the firm ground of some knowable and known thing "like truth".

Thanks. I promise I'll buy the book this time. :)

[edited by: Webwork at 6:18 pm (utc) on July 19, 2006]

6:32 pm on July 19, 2006 (gmt 0)

Preferred Member

10+ Year Member

joined:Dec 8, 2003
posts:548
votes: 0


Norvig clarified that it was not Berners-Lee or his group that he was referring to as incompetent, but the general user.

"We deal with millions of Web masters who can't configure a server, can't write HTML. It's hard for them to go to the next step. ...

Please! I suggest that Mr. Norvig wade through a vanilla Apache installation's httpd.conf in order to remove all the stupid and dangerous default settings that are in there.

Besides, what kind of HTML is Mr. Norvig referring to? Is it the one that Berners Lee has been dreaming of since the beginning of the nineties or the murky tag soup that we have to deal with nowadays? I guess Berners Lee left the sinking boat just in time to chase another dream: the semantic web.

The idiocy is a collective one I'm afraid. We're all to blame: standardization commitee members, browser/server implementors and webmasters. We made the web the chaotic place it is today. As such it simply mirrors the world around it. It's time that Norvig faces reality again. As far as Berners Lee is concerned I have given up all hopes. Semantic Web ... yeah, right!

9:16 pm on July 19, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 25, 2005
posts:677
votes: 0


Hadnt found anything "search revulutionary" in the article
10:43 pm on July 19, 2006 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member henry0 is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 19, 2003
posts:4397
votes: 2


A good Reading at W3.org [w3.org]
11:07 am on July 20, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2002
posts:872
votes: 0


Epimenides, the webmaster, once said: All webmasters are liars.

If I understood Mr. Norvig correctly, all he said was: How can you talk about questions of truth and consistency in semantic analysis, webmasters not even can manage present to parsable syntax if.

He wasn't THAT wrong, he was?

3:25 pm on July 20, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member jtara is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Nov 26, 2005
posts:3041
votes: 0


Mr Norvig's point is moot for a large number of Semantic Web applications. Google is blinded by their focus on search. The Semantic Web isn't primarily about search.

A good example from the blog article cited above: repurposing your bank-statement data - e.g. plug it into a calendar. That would be possible, if there were standards for tagging bank statements.

Your bank has no reason to lie when they tag your bank statement. They have every reason to make their bank statement convenient for you, allowing you to plug it into a calendar, or other software.

MOST data that is behind a log-in/password would fit the same profile. Why lie about data that can't be searched anyway, and/or is personal/specific to a particular user?

Will the Semantic Web be useful for finding the "best" widgets, or comparing prices? Probably not. But don't throw the baby out with the bathwater. There are plenty of other things it can be useful for, and for these applications Norvig's argument is a red herring.

10:15 pm on July 20, 2006 (gmt 0)

Preferred Member

10+ Year Member

joined:Mar 24, 2002
posts:373
votes: 0


"Will the Semantic Web be useful for finding the "best" widgets, or comparing prices? Probably not. But don't throw the baby out with the bathwater. There are plenty of other things it can be useful for, and for these applications Norvig's argument is a red herring. "

That's true, but then that's not a "semantic web", that's a "semantic database".

I got the impression that Berners-Lee wanted most of the internet to become machine-readable, but that isn't possible if you just want to rely on trusted sources who have no reason to lie to you.

I think in this instance Google is right and Berners-Lee is wrong: the only way to tell if something is reliable is to see how other reliable sources react to it. The reason for this is simple, contents will never be machine-readable, spammers will always find a way of disguising their websites, so only a human can actually tell what is spam just by looking at the contents.

Berners-Lee's defence, that a semantic web could tell who had written information, doesn't really make sense because he doesn't explain how you verify that authors are who they say they are, it's just as vulnerable to deception as anything else.

It's not that Berners Lee is promoting bad ideas, it's just that they're totally impractical for an open web. On an intranet, yes, but not on the internet as a whole.

thgyspsy

3:59 pm on July 21, 2006 (gmt 0)

Inactive Member
Account Expired

 
 


I have to put my vote with Google as well (did I just say that? OMG!). I think, at this point, Berners Lee is looking through Rose tinted glasses. It is a wonderful utopian dream, I just can see ‘the dark side’ making a mess of it.

“they're totally impractical for an open web. On an intranet, yes, but not on the internet as a whole.” Say it quite well. There really isn’t enough consideration being given to those who would circumvent it for their own purposes. Hard to believe, I know, but true (lol)

4:17 pm on July 21, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member jtara is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Nov 26, 2005
posts:3041
votes: 0


Berners-Lee's defence, that a semantic web could tell who had written information, doesn't really make sense because he doesn't explain how you verify that authors are who they say they are, it's just as vulnerable to deception as anything else.

Yes, he does. Digital signatures and trust engines.

5:27 pm on July 21, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member jtara is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Nov 26, 2005
posts:3041
votes: 0


That's true, but then that's not a "semantic web", that's a "semantic database".

No, it's still a semantic web, even if not EVERYTHING on the web is semantically-tagged, and even if EVERYTHING on the web is not trustable.

Your bank account could link to your merchant accounts. Your merchant accounts could link to product and customer-support data. When does the warranty run out on the widget you bought with your credit card on July 10 from Big Box Store? Ah, it's in warranty, I can file a claim by clicking here...

It's not just a single database, but linked databases from diverse sources. It's certainly a "web" even if it encompasses less than everything on "the" web.

It's just silly to say that this is of no use, just because there are people who will lie in their mark-up to game search. 90% of Semantic Web applications don't involve search, IMO. Most of it involves following links between data amongst trusted sources.

Plenty of trusted sources that could add great utility to the web by marking-up their data semantically:

- U.S. Weather Service
- Trademark and Patent Office
- Material Safety Data Sheets (individual manufacturers)
- Drug safety information (pharmaceutical companies)
- Material properties (manufacturers, trusted scientific publishers)
- Installation instructions, product data sheets, etc.
- Real Estate listings (MLS, established, known realtors)
- neighborhood crime statistics (local governments)
- airline, train, bus, etc. schedules (local governments)

I could go on and on. Remove price comparison and product search, and you still have a HUGE amount of utility that isn't there today. Just following stuff around from one trusted source to another.

Think of how you use the web today. Only automate much of the "pulling together" of diverse sources. How much time do you really spend doing searches? Most of the time, you are following links around, and-or typing-in URLs (you see a product at a storefront, you want to get more details from the manufacturer's website), amonst sources that you already trust.

Google's got a hammer called search. To Google, everything is a nail. The Semantic Web isn't a nail, and Google's got nothing to fit it. Therefore, it is impractical and worthless.

6:08 pm on July 21, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:June 16, 2006
posts:188
votes: 0


Wow... this is like, uh... crazy funny.

This whole thread should be deleted 'cause nobody here caught the underlying tones that REALLY matter to the whole situation.

uhg... i must be having a bad day, cause it seems like everyone on the internet has blinders on.

Come on sheep! This way ------>

hahahah

9:07 pm on July 21, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member jtara is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Nov 26, 2005
posts:3041
votes: 0


nobody here caught the underlying tones that REALLY matter to the whole situation

So, illuminate us.

11:00 am on July 22, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2002
posts:872
votes: 0


> It is a wonderful utopian dream

Yes, exactly. The origins of this dream date back to Marvin Minsky's prophecies on AI, if I remember correctly. And in contrast to Kennedy's dream (also dreamt in the early sixties) of sending a man to the moon, the rocket-science of consistantly understanding meaning has not even left the ground, because it is tied by a Moebius-ribbon. Even fourty years later I am still waiting for a machine to drive my #*$!in' car, so that I can take my hands off the wheel and continue to hack my keyboard.

Sorry, but I am a bit conservative and thus insist on quite strict definitions of "semantics." Perhaps I might accept visions of a "semiotic" web...

> linked databases from diverse sources.

There are a number of laws over here, which really do forbid that. I have no idea how long these laws will persist, but they will definitely slower progress tremedously.

Any digital signature or other prove of trust will always be a challenge to the hacker community in the first place.

The most fascinating aspect of the web is its "openness."

A funny coincidence that I just finished a little book with essays from Popper. Not "the open society and its enemies", though that would fit quite well. I guess he'd say: there is no absolute truth nor trust nor meaning. All we have is our imperfect theories, hypotheses, databases and search-algos. Let's continue to improve these.

4:46 pm on July 22, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member jtara is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Nov 26, 2005
posts:3041
votes: 0


The origins of this dream date back to Marvin Minsky's prophecies on AI

How is that relevant? The concept doesn't require AI in order to work. Indeed, I've pointed-out elsewhere the failure of AI. So, Berners-Lee envisions AI-driven agents putting-together data from the Semantic Web.

Others of a more practical bent envision Excel putting-together data from the Semantic Web.

the rocket-science of consistantly understanding meaning has not even left the ground, because it is tied by a Moebius-ribbon.

What is rocket-science about "this is the name of a manufacturer", "this the a manufacturer's part number", "this is a price"?

So, Berner's-Lee may have a dream that someday the President can ask an agent "how can I improve relations in the Middle East" and get a useful answer. That that day may be far away or never doesn't detract from immediatly-practical things that can be done without having to envoke rocket science.

Sorry, but I am a bit conservative and thus insist on quite strict definitions of "semantics."

So, pick another name for the current efforts in this direction, and let's move on.

> linked databases from diverse sources.

There are a number of laws over here, which really do forbid that. I have no idea how long these laws will persist, but they will definitely slower progress tremedously.

I don't know of a law that would prevent a consumer from linking data from multiple databases. And I don't know of a law that would prevent publishers from linking publically-available data. There are laws about sharing of data of certain types - such as medical records, credit history, etc. by those who collect such data. I don't see that as being one bit of a problem. In those cases, the publisher will provide the data only to the consumer herself and to those it is authorized to share it with. The consumers electronic agent would then be free to link the data as the consumer wishes.

FWIW, these laws are already widely abused on the Internet. Anybody with a credit card can purchase a wide array of personal data. Oh, so you have to lie and check a little box saying that you obtained the consumer's permission...

It's now Standard Operating Procedure amongst small businesses and some big ones to use these services to check-up on prospective employees - with or WITHOUT their permission. For example, I've been told by the manager of a McDonald's that he checks-up on prospective employees this way. No permission is obtained, it's done on a personal credit card and turned in as an expense. Most likely not McDonald's corporate policy, but a decision of a local franchaise owner.

Given the widespread abuses by companies openly selling this information to anyone willing to pay for it, I doubt that laws about linking databases of private information are likely to give the Semantic Web much pause.

Any digital signature or other prove of trust will always be a challenge to the hacker community in the first place.

As is encryption. As is the transmission of credit-card data over the Internet. Yet, we still have e-commerce. Imagine how foolish you would feel today had you said a few years ago that selling things on the Internet and taking credit card online was impractical, because of hackers.