Semantic Data Extractor

I've been doing quite a bit of research lately into this whole semantic web thing and I've found myself entrenched in the new WCAG 2.0 documents.

The Ultimate SEO Guide for 2009
[webmasterworld.com...]

Once you become hooked at the above, you then get hooked by all the other references that seem to be infinite at times. Actually, many of them loop back to themselves. ;)

There are a plethora of tools that are buried at various authoritative resources, one of those being the W3C. One of those tools that doesn't get much play is the...

Semantic Data Extractor
[w3.org...]

This tool, geared by an XSLT stylesheet, tries to extract some information from a HTML semantic rich document. It only uses information available through a good usage of the semantics defined in HTML.

In the past month, I've probably run well over a few hundred pages through that tool. I'm wanting to see how many of the semantic elements I can target on one page. I've extracted all the data that it looks for and this is the list you end up with. I am now using this list as a general guideline for page development. Depending on the content of the page, I want to make sure that I've covered my bases in these areas...

Extracted Data
Generic Metadata
Title
Author
Description
Contact Information
Language Code
Explicit language annotations within the document
HTML Profile

Related Resources
Translations
Alternate Formats
Starting Page
Next Page
Previous Page
Table of Contents
Index
Glossary
Copyright
Chapters
Sections
Subsections
Appendix
Help
Bookmarkable Points

Defined Terms
The following terms are defined in the given HTML page:
Abbreviations and Acronyms
The following abbreviations and/or acronyms are used in the given HTML page:
standing for ""
Citations and Quotes
There are some quotes and citations in this page:
* [source]
References were found to the following sources:
*
Document Outline
*

How do your pages display semantically? What do you see when you turn styles off? Or images? Or both? If you run your pages through the Semantic Data Extractor, how many of the above areas are being extracted from your documents?

^ Based on my document testing so far, a large percentage of websites fail miserably when it comes to extracting semantics from their pages. That can't be a good sign, can it?

Semantic Data Extractor

pageoneresults

pageoneresults

lordgore

chicagohh

phranque

pageoneresults

chicagohh

phranque

chicagohh

lordgore

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week