Forum Moderators: open
I'm interested in seeing what other tags would be considered good structure, including what search engines would consider good structure too!
<?xml ...
<!DOCTYPE ...
<html>
<head>
<title></title>
</head><body>
<h1></h1>
<p></p><h2></h2>
<p></p></body>
</html>
What web page resembles...
Some of mine do, at least within the content div. The nav div is more like
<div>
<ol>
<li><a>text</a></li>
<li><a>text</a></li>
<li><a>text</a></li>
</ol>
</div>
And the header div may just be a set of images. I seldom have need for xml, so that's one place where my documents differ from JAB's outline above. 4.01 strict is all that most of my clients need.
...content is being re-written to optimize for search engines
That's even true for search WITHIN a company, not just the Yahoo's and Google's. With so much information all around us, knowledge management has become extremely important -- and that includes being able to search for AND FIND the particulars that aere important for you in the moment. There's nothing like relatively standardized structures to make your company's information (or your personal information) easy to dig up.
One of my clients has some important literature in the area of 100 to 200 years old. I can barely read those originals - it seems to me that even the flow of ideas was more chaotic back in those days.
[edited by: tedster at 12:32 am (utc) on Aug. 21, 2005]
The whole panoply of available elements in HTML, in particular those retained in the strict DTDs, should be used where necessary.
PS I know about IE and XML declaration.
I made a similar transition a while back and I find that the sites I've designed since then are much more effective. And to a large degree, that's because my emphasis has shifted from design to content. The "M" in HTML is the big deal - we START with a document -- that is, with content -- and then we "mark it up" for the web (thatr means for any number of potential user agents.)
I used to design print ads, so I was very focused on grabbing the eyeballs. That seemed especially important when you're buying 1/4 of a page that is otherwise filled with distractions. The thing I wasn't getting about the web is that once someone has your site on their monitor, then at least for the moment, you have no one competing for those eyeballs. So giving your visitors the content they came for, rather than some eye candy, makes a big difference in business success.
I eventually evolved a saying for myself: "slick ain't sticky". I even wrote a post about it in New To Web Development [webmasterworld.com], because I feel it is such a valuable paradigm shift. I really do wish I had started out thinking this way.
...including what search engines would consider good structure too!
One thing search engines thrive on is mark-up that is "well formed". Some people feel that this means mark-up that validates according to the W3C. Well, being valid is one sure way to guarantee that your mark-up is well-formed, but being well-formed can be a bit less rigorous than being valid (especially in HTML 4.01).
Search engines really don't care if your documents use some deprecated attribute, or even something that's proprietary, as long as the code itself is well formed and they can parse it without having to go into an error recovery routine that may or may not succeed. When error recovery fails, then a chunk of the content may well be skipped (been there, done that!)
So closing tags is extremely important - and doing this in the order they were opened is very helpful too. Spelling errors in tags are an awful mistake: <spam> or </spam> is an error I've made more than once, and it orphans the partner tag! It's a good practice, even in writing html, to use closing tags even where they are optional (li, p, td and so on)
Copy/paste errors that accidentally take out an angle bracket are much to easy too create, and this is exactly the kind of error I've made that caused search engines to miss a chunk of a page because the code isn't well formed.
As encyclo said, any valid tag that accurately conveys the semantic values of the document is a good practice. This means that marking up menu links with <li> is a great idea. Divs, even nested divs, convey the manner in which parts of a document relate to each other semantically -- whereas table cells often have a way of splitting related content into relatively dissociated parts of the html.
Lots more to say on this - but I hope I've given some sense of what I'm talking about.
Have you found any effective ways to communicate this fact to the client?
Yep. Take them to Google, click on the 'cached' link, then click on the "Click here for the cached text only" link at the top of the page.
Then explain that in basic terms, that's basically what Google sees - no flash, no javascript, no pretty pictures.
If they are still happy with their site after that - ask them how much budget they have for PPC......
To get back to JAB's original question, I'd like to talk about information structuring and the H1 tag. Although there's no absolute or technical prohibition on using it more than once, semantically I think that it's an extremely peculiar idea -- you're saying "this one document is about two different topics".
When I feel the need for two H1 tags, that's often a sign to me that maybe this content should be more than one page. I don't think search engines in general deal well with pages that are both long AND mixed topic. I also don't think users deal with it very well either.
It might also mean I'm not seeing what the REAL h1 tag for the page should be -- and by wordsmithing my way to that I can make the whole page perform better.
But using the H1 more than once looks like the road I may well take with my latest project; The home page has 3 columns, each column with a <H1> then content below.
This dosen't mean I have too much content or have an over crowded page, I simply want 3 headings on each column e.g Menu, Welcome, Latest news. What would you suggest the best about this?
Each one is the main title of the column, using H2 for column 2 wouldn't make sense
<meta name="description" content="put description here">
WW uses a doctype:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
G has a content-type:
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
The "M" in HTML is the big deal - we START with a document -- that is, with content -- and then we "mark it up" for the web (thatr means for any number of potential user agents.)
I used to design print ads, so I was very focused on grabbing the eyeballs. That seemed especially important when you're buying 1/4 of a page that is otherwise filled with distractions. The thing I wasn't getting about the web is that once someone has your site on their monitor, then at least for the moment, you have no one competing for those eyeballs. So giving your visitors the content they came for, rather than some eye candy, makes a big difference in business success.
Just thought that one should be repeated. Class post :) :) :)
If your webpage was a book, the <title> would be outside on the cover and the <h1> would be the same thing repeated inside the book (perhaps including a subtitle). Then, for each major section of the book you would have a <h2>, and for each chapter a <h3>, and so on...
On the lowest level, each page would have a page number. That would be your <div> tags. But even within the page you could have different kinds of content. That's what the <p>, <span>, <table>, <img> tags and so on is for.
--
It is often very helpful to think of each individual page, and the site as a whole, as being a book - for very large sites, think libraries in stead (imho, fwiw, etc.)
[edited by: claus at 1:58 pm (utc) on Aug. 21, 2005]
Oh yes, I'm cutting off a client today btw Tedster since you mentioned it. Lack of appreciation for the work and not paying me enough half of what was due is a big no no. I must admit their work was a complete turn around from what I had setup about half a year ago on my own site. That is what's great about this place, half a year and everything in perspective can change (for the better of course). Just a pain in the ass to rework everything around it, but an improvement is always worth it I think.
I am now using meta tags and have been on a certain part of my site for a little while now and I've noticed Google has taken advantage of them.
jbinbpt may be right about using the header 2 tags for multiple column headings as menu items wouldn't really represent the main idea of a page.
I've gone over various tags and have found out the difference between ol and ul as well what the heck dl tags are. There are some other tags that I've noticed even before the need to really sit down and look over HTML in depth. All great advice by everyone...
It looks like HTML was created for academic, military type documents.
What web page resembles an h1, p p p p h2, p p p p list h3 p p p p p blockquote p p p p h3.?
Well, most of mine do. ;-)
Yes of course, HTML was invented as a way of marking up standard academic publications so they could be displayed on the Internet. What Berners-Lee did was specify a group of structural elements that you commonly find in academic papers, and he then created a stripped-down version of SGML (invented years before to handle the computerization of huge government documents, like airplane maintenance manuals) that most anyone could use. These basic elements are what have always been and still are used in most every academic journal. Over the years some of the inconsistencies (or constraints) in Berners-Lee's original element-set have come back to haunt us. There never should have been empty elements like <br> for example; the conceptually correct element would have been <line>, which may appear in some future version; but we all labor under the burden of history.
Anyone interested in getting a better handle on the foundations of HTML might want to spend some time studying the default stylesheet underlying most browsers [w3.org]. Every browser has its own stylesheet built in -- it's what you get when you don't apply any CSS at all to an HTML document. These default styles are largely inherited from the original Mosaic-generation browsers and give you a feeling for how HTML was originally conceived.
I gradually began embracing CSS to get rid of all my <font> tags. Then I started using it to align backgrounds and list bullets. I found that the more I embraced CSS as a layout tool, the more my HTML started to look like the ideal semantic M in HTML.
I'd teach any new HTML developer to start with purely semantic XHTML and do everything in CSS from the start. Why learn bad habits that I personally had to unlearn because I fought in the browser wars?
To achieve certain visual effects, I will wrap my content in a few extra <div> tags, but that's a forgivable diversion, isn't it?
Shamefully, I still prefer <b> to <strong>
A great move, JAB. HTML is all about the document and its meaning - whereas wysiwyg editors like FrontPage, Dreamweaver, GoLive and so on, are all about how it looks.
Total nonsense, as the wysiwyg editors are all about creating the content and not messing with HTML. With your analogy I'd craft a letter or a fax cover in Rich Text Format or Adobe Postscript directly instead of using MS Word or Word Perfect.
When people get past the naive idea that hand crafting HTML makes a real difference opposed to using wysiwyg editors there would be a lot more content and a lot less debate over how to create it. The only problem I've ever encountered that required a hand edit in HTML was tweaking incompatibilities in broken browsers. This is rarely the case these days unless you're getting too complex on the bleeding edge of HMTL and then again your visitors usually could care less except in extreme cases where the technology either makes or breaks the site.
When I first got my hands on a computer in the 70s I coded programs directly in HEX, then Assembler, then C, C++, etc. and each step up the ladder I looked back in marvel at how I wasted my time doing it the hard way back in the day. True, the compilers added in a little more garbage here and there, but without that evolution software wouldn't be nearly as evolved as it is today. Now I sit and I marvel at why the people hand coding HTML are wasting time in the same way when what's really needed are bigger and better tools to crank out more content instead of spinning their wheels doing nerd work tweaking HTML in Notepad or some low level HTML tool.
For the most part nobody even cares as your visitors couldn't tell how you built the page as they are only concerned about whats on the page, can I find it in the search engine, can I read it when I get to the site - everything beyond that is academic.
For anyone that claims it gets better SERPs I can show you many thousands of pages ranking exceptionally well using FrontPage and Dreamweaver and those people churn out content non-stop instead of worrying about an extra tag that nobody cares about.
It's like owning a bulldozer but insisting on using a shovel because of the 'technique' when at the end of the day the guy using a bulldozer virtually always wins against the shovel.
[edited by: incrediBILL at 5:10 am (utc) on Aug. 22, 2005]
The problem with (x)HTML and learning for the first time is that it is the ultimate language for the internet and therefor so many other technologies ultimately depend upon it to properly deliver the product. You have to consider CSS, JavaScript, SEO, and so many other aspects that all the tutorials I have seem just seem to ignore the complexity involved which is a shame because I can say I have a true appreciation of good work (and I don't consider my work good work else I'd be answering a lot more threads ha!).
I first started with Frontpage 98 and though I abandoned it eventually I will admit it is how I started. Messing my code up it held me back and eventually I started to learn it on my own.
As I see it, the best semantic use of these tags would keep all the bold rendering instructions in CSS and out of the HTML - so all you would see in HTML would be <strong> tags. This practice cleanly separates rendering from meaning, and its widespread adoption would make the <b> tag a dinosaur. Of course, given all the legacy code on the web, the <b> tag must still be obeyed by browsers.
Note that when you use <strong> tag, it's the equivalent of having an aural browser raise its voice for the entire section -- although none of the current aural browsers can actually afford do this today because tag use is so inconsistent.
A similar situation exists between <i> and <em>. I believe the intended exectution in an aural browser is that <strong> is louder and <em> is raised in pitch. But as I mentioned earlier -- aural browsers cannot currently afford to render instructions this way.
In real-world practice, I also still like <b> tags for some situations. One simple letter for the tag - very efficient, even if it is a slightly non-purist usage.
[w3.org...]
Put your URL into the system, and if it doesn't fail (the service tends to be a bit buggy at times) then you will be given an outline of the semantic structure of the document as seen by an XSLT Java Servlet and a copy of HTML Tidy.
The good thing about this service is that it is giving a purely machine-read interpretation of a web page rather than a visual, human one. Semantics is all about underlying structural meaning with the HTML presenting your content in the most appropriate and easy to understand way possible.
Hava a look at the results for the w3.org home page [w3.org] - as you can see, it's not too bad, but not perfect either (the copyright section is outlined with "unknown titles").
On the
<b> versus <strong> question, HTML has a lot of legacy baggage comprised of a large selection of non-semantic elements. Semantically speaking HTML is not very rich, so you usually have to make various compromises with your choice of markup. This excess baggage has at least the advantage of continuity, as backwards-compatibility ensuring the longevity of published documents.
Here's a way to make the web a better place: the folks working on Firefox/Mozilla should create a simple button on the browser toolbar that says "Show me an outline of this page." When you click it, the page collapses to just titles, headers, and the first line of each paragraph. That would let you get a quick view of a page's contents. (You can actually set up your own user stylesheet that will do something close to this: just set all elements except <H> to display:none and see what happens when you apply the stylesheet.)
I remember from a couple years back that the thing to do was to ensure that you had lots of h1 and h2 tags for the search engines.
It was to the point that some webmasters suggested putting key phrases in h1 and h2 tags and then using CSS to format it however you want.
Just Curious - How much do you use CSS to alter the default appearance of header tags?
Custodian
H1 tags can be formatted any way you want as CollyMellon has suggested.
The Key seems to be - h1 tags are the main tags and should be used once per page and at the top where it smeantically makes the most sense.
This has been an eye opening discussion, just not sure yet whether I'm gutsy enough yet to make content king and then mark it up.
Custodian
> Put your URL into the system,
You can do that here:
[w3.org...]
All pages I work on now have to pass the following validators...
1.) W3C Markup Validator
[validator.w3.org...]
2.) XACT Accessibilities Validator
[webxact.watchfire.com...]
3.) Semantic data extractor
[w3.org...]
The last of course is a manual validation done by the author and not by the site itself. Still I value these online tools the most.
The XACT validator really works better for HTML 4 as I REALLY do not like using adjective like tags in my markup and it complains about missing hieght and width tags. It also makes a lot of obnoxious warnings EVEN IF you have a completely empty div as the only thing in your body...but I have yet to find any OTHER validator in regards to accessibilities.
[jigsaw.w3.org...]