This could be very important.
Of particular interest are the WebPage and WebPageElement types, allowing you to specifically mark up breadcrumbs, nav, header, footer, sidebar, main content etc. Seems like they would have to take this data with a grain of salt to prevent abuse, for example marking an obscure paragraph in tiny text at the bottom of the page as the "main content" of the page. We probably want to be very careful to mark up elements accurately, as discrepancies could be seen as a signal of spam.
I plan to make it standard procedure to use this markup schema in all of my templates.
Yeah, I got an email from LinkedIn this morning about it.
I'm trying to cypher through it now.
Looks good though.
I wonder if it will be something close akin to sitemaps.org?
The two sites, side by side do look awfully similar.
Oh, I just tried to gen a 404 error on the site to see what the 404 error page looked like and it's a cool Google 404 page.
This will prove very important. These will replace the current microformats we all know and *cough* love. Actually, it's not mandatory that you replace them, but if you want to save time coding and be consistent to Google & Bing (why is Yahoo even mentioned?) then you will use schema instead. Honestly it looks like it will be easier to implement that microformats. I'm glad to see this.
System: The following message was spliced on to this thread from: http://www.webmasterworld.com/google/4321505.htm [webmasterworld.com] by tedster - 1:24 pm on Jun 3, 2011 (EDT -4)
|NEW YORK: Internet search provider Google has joined hands with rivals Microsoft and Yahoo for a project -- schema.org -- that will improve web search results. |
More at: [economictimes.indiatimes.com ]
I'd say they rushed things a bit. The W3 Microdata Specification is still a Working Draft as of May 25, 2011...
Now, let's talk about that [Schema.org...] website. I find it difficult to take anything seriously from an org who releases an HTML5 specification under an XHTML DOCTYPE. The site fails validation with a list of simple amateur errors. I don't understand why that site isn't using an HTML5 DOCTYPE?
The Microdata syntax will not validate when using anything other than an HTML5 DOCTYPE. So, for those of you who maintain valid XHTML code, this will present challenges. You may be forced to switch to an HTML5 DOCTYPE so that you can take full advantage of all things HTML5 and maintain valid documents.
|Microdata is an extension to HTML5 that provides another way to embed Microformats and Poshformats vocabularies. |
|For common semantics on the web (e.g. people+organizations, events, reviews, syndicated content), microformats are still simpler and easier than microdata, and are already well implemented across numerous tools and services. |
Microformats - Microdata
When it becomes the norm it won't stand out anymore. Early adopters will benefit most but blindness is inevitable.
|You may be forced to switch to an HTML5 DOCTYPE |
Is there a major drawback to this? Seems like something that might be a good idea at some point anyway.
|Early adopters will benefit most but blindness is inevitable. |
Well, then the time to adopt it is now, eh?
Oh, the granularity to which we can get down to w/ schema.org implementation.........
but it makes me also wonder to what control we're giving up by agreeing to their terms and conditions [schema.org].
If we use the URL for a value and they change the expicative nature of what said URL means then we could be jammed up, changing code, etc.
|If using the href="http://schema.org/InStock" were all of a sudden to change to something else then what happens to the data value? |
Don't take me wrong, I love the formatting but wow; I wonder how much link juice schema.org will get?
But need I digress, we will almost definately be an early adopter of this. ;)
Is the seem similar in scope to that of FBML?
Content scrapping by Google v2.0, however I think it's the way for next generation of search engines. (Un)fortunately Google won't be one of them, because Google users are not used to "rich searching". It is 10 years already without being able to sort or filter(not even basic one) results. You can't teach an old dog new tricks.
|because Google users are not used to "rich searching". |
I don't know that, that is necessarily the case.
And........Welcome to WebmasterWorld. ;)
|Hmm reading through the material and it seems when I try to navigate the site it crashes IE browser so makes me wonder if they followed their own instructions to test it. |
They've got some spurious hash tags in some links to home that crash in IE8 but not in FF.
Eg, from this page... [schema.org...] ...click on the nav bar link to Home , which is coded...
This will open home in Firefox (with a url including the hash tag, as coded), but causes IE8 to give an Operation Aborted error message. Not sure what's going on.
Here's a very informed criticism from ManuSporny, current chair of the working group at the W3C that created RDFa.
|The False Choice |
The schema.org site makes it appear as if you must pick sides and use Microdata if you want preferential treatment. This is a false choice! They even state that you cannot use RDFa and Microdata and Microformats on the same page as it will confuse their parsers – forcing Web designers to exclusively use Microdata or be lost in the morass of search listings. The Web community should decide which features should be supported – not Microsoft or Google or Yahoo.
...Microformats were created in an open and community-driven way. RDFa was created in an open and community-driven way. Schema.org was not, and if it catches on, expect to see it not scale over the long term and an increase in vocabulary lock-in to the major search companies. Which are you going to choose? Facebook's Like button markup, or Google/Microsoft/Yahoo's Microdata markup – you are being put into the position of choosing one of those exclusively.
[edited by: Robert_Charlton at 4:28 am (utc) on Jun 4, 2011]
[edit reason] fixed display of filtered name and link [/edit]
|So, for those of you who maintain valid XHTML code, this will present challenges. You may be forced to switch to an HTML5 DOCTYPE so that you can take full advantage of all things HTML5 and maintain valid documents. |
And the Facebook OpenGraph tags don't validate in HTML5.
Nice to see these forward thinking orgs care so much about little things like standards.
|Which are you going to choose? Facebook’s Like button markup, or Google/Microsoft/Yahoo’s Microdata markup – you are being put into the position of choosing one of those exclusively. |
Sorry, I don't understand that. Why not both?
|brotherhood of LAN|
|There is markup for: |
Proper nouns. You can see why this would make life easier for a search engine. "The Who" at the beginning of a sentence, for example, can be quite ambiguous.
Having a S.E. that can algorithmically understand concepts in a Wordnet "word sense" structure would offer amazing potential, perhaps a move towards the "theming" that has been often spoken about since the days of Teoma?
I'm going to ask that everyone read this...
HTML Microdata - W3C Working Draft 25 May 2011
The Big 3 (2), were part of the specification development. This is not something that "they" came up with. It has been a Working Draft for quite some time. At least back to Jan 22, 2008.
I read "The False Choice of Schema.org" and there wasn't one mention of the W3 Microdata Specification. The guy who wrote that piece is a Chair on the W3 RDFa Specification. I'm a little confused as to why he's fighting the organization he works with.
Here's the thing. It's a done deal at this time. There was plenty of time over the past few years to get involved while the Working Draft was being assembled. Where were these folks then?
|The Web community should decide which features should be supported – not Microsoft or Google or Yahoo. |
I don't agree with that at all and it sounds as if the tail is wagging the dog.
Who are we to advise them which type of formatting types to use. As long as it's a w3C Working format, even if it is in draft, at least they're trying to prod us along forward.
In thinking about it over the past few days, I don't feel the big 3(2) are coluding to do this but are rather giving flexible options for us that will further broaden and enrichen our "vocabulary".
That's just my Schilling's worth. ;)
There is a reason that the search engines get to choose the micro-data format. They are the only ones offering incentives for using micro-data.
If you run a website as a business, it doesn't make good business sense to make your site machine readable. You would just be making it easy for your competitors to steal you data and get stats about your company. The only reason to implement micro-data that I've seen is what you can get from the search engines for doing so.
|Microformats were created in an open and community-driven way. RDFa was created in an open and community-driven way. Schema.org was not |
Note that Microdata was also developed in an open and community-driven way. Schema.org is a just a schema (shared vocabulary) for Microdata.
Manu's main objection appears to be that schema.org uses Microdata instead of RDFa; which is understandable since he's invested a lot of effort into RDFa. But it's not a convincing argument for why schema.org should use RDFa.
The W3C RDFa and W3C Microdata specs are competing technologies. It's a highly politicised issue for the W3C because TimBL invented RDF and then the W3C spent years building a temple around it in the XML years.
RDFa is more powerful but has an history of being misunderstood and mis-coded by authors. Microdata is a bit simpler, and this appears to be the reason it has been chosen over RDFa for schema.org.
onebuyone / Propools - I don't know that "rich searching" has much bearing on the use of structured data. Whether or not Google users avail themselves of the sort of filtering and "rich searching" now available with Recipe View, the far more important uses that Google (and the other search engines) have made of structured data is in the generation of rich snippets and search verticals.
tedster - Thanks for quoting ManuS#*$!y's "false choice" post: I think everyone concerned with the nitty gritty of this issue so read it carefully, as well as a contrary view [mkbergman.com] from semantic web researcher Mike Bergman.
As I've pointed out in my response to ManuS#*$!y [manu.s] and my own piece [seoskeptic.com], I don't think the success of failure of schema.org will revolve around whether or not the schema.org microformat vocabulary is the best way to deploy attribute-based structured data (it isn't) but whether or not it will be adopted on a wide scale anytime soon. I agree with the schema.org documentation that microdata is easier to deploy than RDFa, so I think the chances of success are fairly good.
(For those wondering ManuS#*$!y doesn't result from tedster and I skipping over the space key, but because his last name triggers MWM's spam detection.:)
schema.org sounds like the days when IE tried to bully the web into adopting their standards instead of W3C standards. Now its a posse of search engines, so they can better scrape your content and keep people from leaving the search screen... from arriving at your website.
|The Web community should decide which features should be supported – not Microsoft or Google or Yahoo. |
Wasn't xhtml dropped in favour of html5 because the majority of web developers didn't adopt it? In which case, can we expect the majority to understand/implement schema.org? And if they don't, will it be dropped a few years from now?
Disclosure - I fall into the "intermediary" category that I mention here.
Why didn't they just call it "Stealing your data Microformat"?
The creation of this format, as clearly stated on the schema.org, is all about making it easier for search engines to extract data - and we all know the ways in which a certain large search engine has been stepping on toes when they get a hold of enough data to compete in any given vertical.
As far as I can see, many of the data items should be part of an on-site search (of an intermediary) rather than being presented to the search engines for them to scrape/combine/use. It's really expensive to collect those types of details for millions of businesses (or other types of record) and the search engines want companies to give them away so that they can be used (eventually) against them!
Google, in particular, should realise that some intermediaries are a useful resource for searchers - as it stands Google is hell-bent on gathering as much information as possible (initially by offering benefits to webmasters, only to shaft them later - see reviews as an example).
You may argue that the change makes it easier for small businesses to be seen - and they will not mind Google re-using the data they present - but the search engines know the percentage of small businesses that will implement the format is small; so they are putting a carrot out to companies with lots of data (and a large percentage of those will have just been hit by the Panda/Farmer update).
The sad truth is that enough large data holders will use the format to give the search engines enough information so it then forces others (reliant on search) to adopt the format.
We should be asking the big search engines why they fiercely protect their own IP yet expect businesses to hand over data (seen as IP for many businesses, although legally different) so freely and without a clear contract as to how it will be used.
I think it's time for search engines to be forced into licensing data from websites so we all know how the data will be used.
I have not fully read the schema.org. website yet but I have a suggestion:
There should be an ability for companies to put "AvailableOnSite" as the data for an attribute if we want to flag it's there for users but do not want to share it with search engines.
We would then see if search engines were caring about search quality as they would be able to say - hey, here's the information you can get by going to this site. I bet they won't do it.
|Why didn't they just call it "Stealing your data Microformat"? |
inbound, I fail to see the logic of this assertion. Either you're offering up structured data for consumption by the search engines or you're not. Preventing them from indexing (or "stealing") your data is as simple as not making it available, either by excluding a resource from indexing, or by simply not publishing the structured data in your code.
Furthermore, as microdata - like RDFa and microformats - use markup attributes, it's not meant to offer up any new data that a website isn't already displaying, but to structure the data that's already there. Structured markup referencing a shared vocabulary makes it easier than parsing unstructured code, but it doesn't mean that unstructured code isn't capable of being parsed. In both cases the data is available from the presentation layer.
Yes the data is already on the page, but marking it up makes it that much easier for search engines to extract that data and display it on their page, surrounded by their advertising instead of the advertising for the company who paid for that data to be gathered.
I can definitely see the dilemma from webmasters with proprietary data. That is their bread and butter, and if Google crawls it and starts displaying it on the search results page instead of making users visit your site and view your advertising, buy your products, or however you monetize the data, there goes the entire business model. My opinion is that if Google did that, the company would shut down resulting in everyone's loss: Google, the company, and the user, because now nobody is collecting this data. So I think Google will keep displaying structured data, but in exchange will have to provide enough traffic to keep the data coming (profitable). I think they will track if their displaying of rich snippets causes a marked decrease in click-thrus, and if so they will have to stop displaying that kind of data because that is bad for their own business. They need the content and data producers to stay around.
One last thing I wanted to add is that our website has reviews, and even though we did not add any microdata labeling them as reviews, Google knew what they were and in some cases displayed them in Google Places. So NOT adding the tags does not mean Google won't identify your structured elements and display them regardless. I think Google assumes if you are asking to be crawled and displayed in their search results that you don't care how or where they choose to do it.
Aaranged; Yes, I can see your point but my fears are based on companies feeling they will get an advantege by unveiling more data than they currently do - and doing that in a way which is simple for search engines to understand and re-use (also making it much easier for scrapers to get to your data). Also, there is a huge difference between a user seeing the data presented on a few pages of your site versus a search engine collecting the data from millions of records and then using it in any way other than to match your website to a search.
I'm not suggesting that information can't be extracted, of course it can be; I am concerned that we will see Google bypass more and more intermediaries by using the data to offer Google Services that compete with those giving away the data (let's ignore the matter of updating that data once it's been used in a competing service - that's a very long discussion).
I know my posts might seem anti-Google, that's because I am worried (as are many others) about the path they are taking. Remember that Google relies on automation and does not employ that many people for each $ they earn ($1.2M revenue per employee) - this means that jobs are being lost every time Google encroaches on a vertical, you might call this efficiency but those on the wrong side of that deal see their prospects diminished significantly (they may lose their job AND find the industry they know all about has lost carrying capacity).
I'm not the one I'm concerned about, what will happen to my employees or even to other decent, innocent people who are swept up in the upheaval that Google/Others is/are creating? If you live in an English speaking country (as the official language) then you're in a society that has a higher level of Information based jobs than average and has lost a great deal of "older" jobs that have moved overseas (manufacturing as an example). What happens to Information based jobs when a giant such as Google goes after that sector (think of the Analytics business)?
It's fairly obvious that "Western" economies are struggling, the last thing they need is for a few huge companies (who often do not pay dividends) to destroy the job carrying capacity of verticals.
The way economies work is being changed by large companies; let's say that a company manages to automate the role of an accountant and offers the service for free (in exchange for a lot of information about companies). We will see a significant percentage of (probably smaller) companies stop paying accountants. The company offering the free service will have to pay some people to deal with the service (writing the code, using the data, marketing it, customer service) but that number will be vastly smaller than the number of accountants (and their junior staff members) losing their jobs. This improves the profitability of the small businesses that use the free service but it reduces the number of people employed in the country. You could argue that the big company offering the free service will make money some other way and this wil offset the difference - that's not the case with cash hoarding mega-corporations, they keey their cash for big purchases (which financially benefit a small number of people and often lead to jobs being lost due to efficiencies).
So, by using free products or happily giving data to Google that could be used in future competitive sevices, we are feeding a system which ultimately reduces jobs and increases the income-disparity of nations (that's a bad thing to most people).
It's maybe unfair to point the finger at Google - as many companies' activities end up doing this - it's just the scale of the change that Google can make to an industry that scares me.
So, who's setting up Microdata as we discuss this topic?
Confession: I am.
| This 45 message thread spans 2 pages: 45 (  2 ) > > |