|Yahoo open search to support microformats|
Indexing 'the semantic web'
The Yahoo search blog [ysearchblog.com] says that their new 'open' search will support microformats.
If this is enough to spur any reasonable volume of publishers to adopt microformats such as hCard and hReview, I could see these catching on in the default web search although perhaps not in Google [webmasterworld.com].
Is it time for microformat SEO?
Unfortunately, this is adding to the Tower of Babble that is the state of HTML standards compliance today. And just when we were starting to put a stake through the heart of XHTML...
The eager webmasters who want to use the "latest" are slowly coming around to the fact that XHTML hasn't been widely adopted by browsers, and that it just isn't the direction things are going in (with HTML 5.0 - if that ever happens).
So, Yahoo has to support both RDFa and eRDF, which embed RDF (and, by extension, microformats) in entirely different ways.
Thus insuring that everybody will get confused, and Tag Soup will continue to reign. Just a new kind of Tag Soup.
I'm sure we will see XHTML pages with eRDF and HTML pages with RDFa - if only because of third-party tools and APIs that are going to pick one or the other.
On the non-technical side, I wonder about the trade-off of making your data so highly accessible by machine. Is it a good thing or a bad thing?
Say, you have some kind of directory site. On one hand, using microformats makes it easier for search engines to find your data in the context of some sort of meaning. Which is a good thing, as search engines have steadfastly avoided dealing with meaning, preferring to deal only with keywords. Of course, microformats convey only very limited meaning ("this is a <kind of thing>"), but at least it's a start.
On the other hand they also make it much easier for somebody else to rip-off the essence of your site. No need to program a screen-scraper to recognize how you've organized things on your pages - you've already done the work for them!
(I'm referring to screen-scrapers that don't aim to simply copy your pages exactly, but to "harvest" your raw data.)
Since I have no idea what microformats are I will safely assume that pretty much no one else does either, which means I won't worry about it.
Microformats are standards for formatting common chunks of information - for example, contact information (hCard), or calendar events (hCalendar).
From Wikipedia, here's an example of an hCard:
<div class="fn">Joe Doe</div>
<div class="org">The Example Company</div>
<a class="url" href="http://example.com/">http://example.com/</a>
It's nothing more than adding a class for each address saying "this is an hCard" (class="vcard", and adding classes for each part of the card (fn = "formal name", tel = "telephone number") etc.
And doing so in a standard way.
So, this makes it easy to scan a page using software and pick-out all the addresses. Or calendar events, etc.
BTW, why is the class name vcard, and not hcard? hcard stands for "HTML vcard"...
Now, let's say you have a site that lists some kind of events, say the site is for a cooking school, and the events are upcoming classes.
If you use the hCalendar microformat, that would enable Yahoo to easily parse and display a few of your coming events in your listing when your site is shown. This may well be a huge advantage.
This sounds like something incorporated in to Leopard wherein if you encounter a physical address in say an email you can click on it and then you have options such as importing it in to address book, or if you encounter a date you can do the same thing by importing it in to iCal.
|On the other hand they also make it much easier for somebody else to rip-off the essence of your site. No need to program a screen-scraper to recognize how you've organized things on your pages - you've already done the work for them! |
That would be my concern. Or, more to the point: I'm not building a database so people can simply find everything they seek on G or Y, without having to visit my sites. Many might be doing that, but we are not.
Very useful analysis, jTara. Thanks.
|I'm not building a database so people can simply find everything they seek on G or Y, without having to visit my sites |
This may be a good thing for your site, but jtara's example was a good one. If you were advertising an event, people have to click on your site if they want to attend. Or a business might want its contact details to be easy to find. I think there are quite a few examples of when it could be useful.
|Tag Soup will continue to reign. Just a new kind of Tag Soup. |
But that's inevitable, surely? So, we just need the right flavour soup. I don't know if microformats is it - I like the idea but I'm unsure about the execution. Although even Google checks for a Creative Commons License [google.com].
I'm worried that applying such microformats will also lead to more mailings and phone calls from people wanting to sell you something. It does make it easier to harvest specific databases on the web.
I don't understand what would prevent me from labeling myself as Bill Gates at Microsoft. (I know many here think that is who I am, but it's not, really.)
|I don't understand what would prevent me from labeling myself as Bill Gates at Microsoft |
I'm not really sure of the relevance. You could make a whole site today pretending to be Bill Gates - you might even get some referrals from search engines for it. But I don't think you're going to actually convince anyone that's who you are ;)
|It does make it easier to harvest specific databases on the web |
I thought that was the whole point? If you have suitable data that you would prefer not to be syndicated or widely available, then microformats may not be for you. But then, being listed in search engines might not be for you either!
|But then, being listed in search engines might not be for you either! |
It is not my impression that search engines have much difficulty indexing my webpages, in fact.
I'm not concerned with search engines here, but with marketing people bothering me with their unsollicited calls. As many are, I'm already being careful about not putting e-mail addresses in machine readable format on my webpages, and I think everyone here is aware why.
Contact information on websites is generally not difficult to find for human visitors, nor for search engines, so I don't see how search engines would be needed to be supported in making that even more easily available.
As it seems that search engines, or at least the major ones, have not much difficulty indexing information on websites, I believe that only systems that are less sophisticated as these have anything to gain from making webpages more semantic.
I'm not so sure if I want that.