Forum Moderators: open
I have been working with XML quite a lot recently. I am having to learn it for a new client we have.
I do know that sites can be validated via w3.org to 'xhtml 1.0'.
My main question for this thread in this forum is to ask what the google technology preferes, or looks higher on .... if it does indeed look upon either.
For instance, will it be slightly easier for spiders to crawl my code if it is Valid XHTML, rather than just valid HTML?
My thinking is that in theory it should make a difference because XHTML is generally stricter than HTML when it comes to closing tags etc etc.
Does anyone have an opinion on this?
What would you recommend XHTML 1.0 or HTML 4.01? Or do you beleive neither makes a hoot of a differnce to search engines?
Cheers,
Webboy
If your document is just pure XHTML 1.0 (not including other markup languages) then you will not yet notice much difference form HTML. If you make the change to XHTML your then moving in conjuction with the latest wc3 standard. For example XHTML improves functionality, like better forms. However as more and more XML tools become available, such as XSLT for tranforming documents, you will start noticing the advantages of using XHTML.
XForms for instance will allow you to edit XHTML documents (or any other sort of XML document) in simple controllable ways. Semantic Web applications will be able to take advantage of XHTML documents.
we use XHTML transitional & CSS-P on the whole of our site with some good results.
using CSS-P means that we can position our content right at the top of our pages whilst being able to position it anywhere we please in the CSS file. this is certainly possible using tables but it is a pain. I got sick of spending half a day trying to figure out exactly which table was messing up my layout.
Also, we get the benefit of single file/site wide changes to look & feel too.
we are planning to move over to XHTML strict next year. plus the use of XML & XSLT in one area of the site.
<meta name="" content="" [b]/[/b]>), but this isn't the case with Googlebot AFAIK. It is more important to keep things simple markup-wise, and use semantic markup such as
<h1>, <h2>, etc. Googlebot doesn't read CSS (yet), but I believe it understands stuff like <b></b> for giving importance (don't know about <strong></strong>, though). Whatever standard you choose, you should make sure keep it validated to avoid any nasty parsing errors which can ruin all of your good work.
There have been lots of discussion about Flash, ActiveHex, Javascript, RisibleBasic, and various other technologies seriously impacting a website's accessibility to humans, browsers, and 'bots alike. And so from viewpoint, the non-tech-savvy webmaster is right to be leery of "new technologies" that are known to be lousy for displaying content everywhere, but are thought by their creators to be great for driving customers to new heights of expense, upgrading hardware and software to see the new stuff.
Is XML like that?
The short answer is, "No." XML is the exact opposite of those, in every possible way.
First, XML is not "another technology". It is simply another outgrowth of the ancient SGML technology, just like the original HTML. It doesn't require a different kind of parser -- an HTML parser will recognize what it understands in XML, and will even recognize (and ignore) what it doesn't understand.
Better yet, what it DOESN'T understand is guaranteed to be the UNIMPORTANT bits (semantically speaking). When you make any serious use of Flash, the viewer has to have a Flash processor to see anything. When you use XML, the viewer sees a (perhaps less prettified) version of your content.
Still better, XML is designed to be portable -- unlike Flash (which is designed to be visible wherever Adobe has ported the viewer, which is admittedly a lot of places) or RisibleBasic (which is designed to port to any other system running the exact same hardware and software configuration, but only with extreme difficulty and expense.) That means even if processes DON'T support it now, they may in the future easily be changed to provide fuller support. These "processes" would include both browsers and 'bots.
So go XML. The closer you keep to current HTML features, the better your content will display on older browsers and crippled wannabe-browsers like Microsoft Word and IE. But Mozilla does a very good job of supporting XML, and IT'S extremely portable and free -- so almost nothing you can do will shut any sane users out. (IE use is insane, as any expert will tell you, but that's a separate and easily cured condition. And XML is so close to HTML that even IE's seriously broken HTML parser will often work on XML -- in fact, the more rigorously simple syntax that "validated" XML imposes makes it more likely that broken parsers like IE will sorta work.)
Googlebot doesn't read CSS (yet), but I believe it understands stuff like <b></b> for giving importance (don't know about <strong></strong>, though).
If one were to create a site using XHTML strict the using <b> tags is not permitted. I'd be surprised if google penalised people for following the specs properly.
Following the w3 specs & using good information design on your website is supposed to be the best way to keep your site squeeky clean.
I don't now think this will affect my ranking any, but i guess for accessibility and general "good coding" reasons, it is not a bad thing to do.
Thanks again,
Webboy
Why is valid HTML code important?
Search engines have to parse the HTML code of your web site to find the relevant content. If your HTML code contains errors, search engines might not be able to find everything on the page.
Search engine crawler programs obey the HTML standard. They only can index your web site if it is compliant to the HTML standard. If there's a mistake in your web page code, they might stop crawling your web site and they might lose what they've collected so far because of the error.
Although most major search engines can deal with minor errors in HTML code, a single missing bracket in your HTML code can be the reason if your web page cannot be found in search engines.
If you don't close some tags properly, or if some important tags are missing, search engines might ignore the complete content of that page.
so vaidate your code to help with "INDEXING CONTENT"
validator.w3.org
Just to confirm .... when you say always validate to HTML, do you mean either HTML 4.01 or XHTML 1.0?
Or is either one just as good?
I guess i am going around in circles. I know where you are coming from and completely agree with what you have said.
There is just no point in changing to XHTML if HTML 4.01 would be best.
Cheers,
Webboy
If one were to create a site using XHTML strict the using <b> tags is not permitted.
Try it, and you'll be surprised:
<b> and <i> have not been deprecated and still exist in both XHTML 1.0 Strict and even XHTML 1.1. It is worth noting also that <strong> does not replace <b> - the former has a semantic meaning of emphasis of importance, whereas the latter is merely a non-semantic element which gives a typographic effect. I would be very surprised if the Google Techs hadn't read the HTML 4 specs at sometime in the last seven years.
Take a look at their markup ;) If they've read the specs, they haven't done much to respect them on their site...
I'd be surprised if google penalised people for following the specs properly.
They won't, but Google respects de-facto conventions more than published specs: it has to, for the simple reason that 99%+ of the current web is made up of "tag soup" pages rather than standards-compliant ones. That is not to say you shouldn't worry about standards - validation is a crucial part of building a good, easy to spider site - but
<b> is everywhere, whereas <strong> is much rarer. I don't know precisely whether Google gives weight to <strong>, but I know that <b>, when not over-used, is given some importance. Just to confirm .... when you say always validate to HTML, do you mean either HTML 4.01 or XHTML 1.0?
Or is either one just as good?
Assuming you are serving XHTML 1.0 pages as
text/html, then there is absolutely no difference in practical terms. So yes, either one is just as good. Choose the one you prefer, and validate to that. Personally, I prefer HTML 4.01. You might prefer XHTML 1.0.