Forum Moderators: open
I didn't realise search engines cared about the HTML code as well
SEs donīt index the HTML tags. But they need to download your whole page and parse/separate content from markup. If there are errors in your markup this separation process might not work.
They need to parse the HTML code to know whatīs in h[1-6] elements, p elements, etc.
If a SE stops indexing after 100k then you would better have lots of content in those 100k than lots of HTML code.
Andreas
They do not index the code, but need clean code so that they can find your content within it. They also need to be able to find <Hx> tags, as well as title and alt attributes within tags, and so on.
So this <H1 My Really Cool Title</H1> might be rendered correctly in IE, but a search engine would probably miss the very words that you wanted it find [Spot the missing ">" on the first tag]. Only a code validator would spot that type of error.
Likewise <a href=acoolpage.htm> might be missed, as attributes should be Quoted. It should be:
<a href="acoolpage.htm"> for example.
These are some of the reasons that you should validate your code [validator.w3.org]. I don't care if your code doesn't validate completely. You should use the validator to remove typos, nesting errors, and such like, but if you want to use a few IE or NN specific tags then I don't generally have a problem with that. Use a validator to ensure that your code is well formed, has logical structure, is nested correctly, and has no tags with spelling errors, and so on. However careful you are, you'll be surprised at how many silly errors a validator will find in your web site.
While you are checking your HTML code in a validator, don't forget to validate your CSS as well [jigsaw.w3.org].
There are five levels of HTML validation:
Items 1, 2, and 3 are vital to fix.
Item 4 is at your discretion, and will probably need fixing when a new browser becomes widely used, which no longer supports the non-standard tags you have used.
Item 5 is only important if you want to code to the W3C standards and stick their logo on your page.
Anyone who routinely ignores items 1, 2 or 3 (and maybe item 4 as well) isn't entitled to call themselves a "web professional".
[edited by: g1smd at 1:40 am (utc) on Nov. 13, 2002]
Just want to give some feed back about the subject on proper parsing.
Firstly I've been dabbling in HTML for five years now and really do think it is a nice field to be involved with. No, I am not a pro, and I have learnt all I know because of my curiosity about HTML.
I'd say at least three years ago I stumbled upon a site where I used to check my pages for proper coding. When I first looked at the parsing and saw the errors I was very down-hearted. In the beginning I couldn't grasp the meaning of the errors, but said to myself "this I will learn".
I kept making changes here and there and begun to understand how the whole HTML worked. I felt good about this because it was quite a challenge, one which I mastered.
I never took a course on HTML, but learnt from looking at other pages and tried to figure how things worked. I did use a wysiwyg program for a short time but found out it was no good. I decided to stick with hard coding.
Later I found out about the W3C and have been a faith-full believer in the system, why shouldn't I, they are the leaders at least I think so. Not IE or NN. I learnt CSS also from the internet and now when I validate both HTML and CSS I am usually right on. I have gone on to XHTML now and like how everything just flows.
There is one subject this thread don't seem to address and that is BOBBY. That I will have to get into because it too, one day will become standard.
I don't mean too be breathy about this, but, it is very important for all of us to adhere to the standards set forth by the W3C. This in the end will make the internet much more free flowing.
Lastly if you don't have one download a validator and check before uploading your pages, then have the W3C validator do so again after uploading them. You'll feel good that you are doing the right thing.
jaybee
the purist
aims for a standard, and writes code that adheres 100%. A really puritan purist goes for a strict!doctype too.
This is probably overkill in most commercial settings
the pragmatist
wants clean code but will knowingly use the occasional deprecated element -- provided they've tested it on all versions and platforms of all mainstream browsers and it works, there is no need to do more.
Sometimes life is too short to do everything. But the pragmatist has a clear business case for the deviations from 100% validated code.
the praying
hacks together some code, tries it out on their own machine, and prays that it will work in all environments, and that it is future-proof against new mainstream browsers.
Basically they are praying that their understanding of how badly-formed HTML should be parse is an immutable industry standard.
That's too much faith in dodgy software companies for me. That's why I prefer to support the efforts of W3C.
Regards,
R.
Please no comments on my English, I am not a native speaker.
Romeo, we would never do that. This is a global community and English just happens to be the primary language spoken here. Most of us can read between the lines and understand what you are saying. If not, I'm sure someone will ask a question to verify if they did not understand.
P.S. There are only a small number of large commercial sites that would pass validation. Since valid html is becoming more public knowledge, maybe those companies will rethink their strategies and work towards validation at some point in the future.
I know I wouldn't want to be the one sitting there with a $100,000 website that cannot be indexed properly because the browsers and spiders are changing the rules and wanting valid code. ;)
> This is embarrassing and non-professional.
Very!
[edited by: pageoneresults at 8:32 pm (utc) on Nov. 13, 2002]
No. Not in the least.
Is HTML validation desirable?
Yes, it is.
Do visitors care?
99.9% of visitors will never look and never care if your site validates or not as long as it looks good in THEIR browser.
Do search engines care?
Except for actual errors, no. Illustrated by the fact that .txt pages index just as well as .HTML in many instances (excluding effects of H? tags and such).
Should they care?
No, they should not. Search engines exist to help searchers find content. Valid code has nothing to do with content. In fact, .txt files SHOULD index just as well as perfectly validated HTML.
Well, then, who does care?
Computer geeks (like me). ;-)
Richard Lowe
Shoot, I was probably two years into my web design career before I really understood what was meant by a "markup language".
So when I see a site with a multi-million dollar budget that's nesting block level tags inside inline tags - or leaving divs and td's unclosed, well, I can't help but feel that's just WRONG --- whether Explorer renders the page or not.
Well, then, who does care?
Computer geeks (like me). ;-)
Me too... no seriously...I just thought I'd stress the point that as well as hackers (or geeks) not liking sites with bad code, validation is the only way to make sure your site works properly and can be understood by the variety of browsers (widely-used and less-widely used - past, present and future) (and plethora of other tools that may use your page (e.g.: S.E.s' tools)).
(How can you expect any tool to parse a document which uses things that are not in the rules? How are the client writers supposed to know what rules you made up for yourself in relation to HTML?)
Using valid HTML makes pages easier (/possible) for the disabled to view, makes them easier(/possible) to update or edit, and helps in loads of other ways (including SEO).
Most importantly, there is the slippery-slope argument and the related argument that many people do not know what HTML will work or not work in common browsers (or even the one browser they use) let alone in the plethora of past, present and future clients/tools that may need to parse their code. Is it not quicker and easier to learn the rules (or use a standards-compliant editor) and know that your page will display right. As has been said elsewhere, you are otherwise just putting it together more-or-less randomly and "praying" that your HTML might work.
Joe
Just because you can get a site to validate does not mean that it will display properly across all browsers. We've all been talking about HTML and working with CSS. Now the second part of the equation comes into play, validating your CSS!
I remember when I got my first site to validate. I was totally jazzed. Then I went on a mission to view it in as many different browsers as I could. I was able to cover almost all currently used versions on both PC and Mac. The differences were somewhat shocking.
Now it was time to hunker down and learn CSS. If you have a simple site, the basics work fine. Once you get into the more complex layouts, you need to know about CSS. Without the use of CSS, you will not be able to validate 100% if you want certain things to happen like "0" margins and background properties and all sorts of other stuff.
HTML Validation is just one part of the entire equation. You should take great pride in a site if you've been able to pass W3C muster on both your html and css. You are now in a small group of developers who are working towards standards.
I've also found that this has become a prime selling tool in development. Offering the client valid html and the use of css puts you a step ahead of the rest of the pack. During the process you are able to help educate the consumer of the benefits that are achieved with your valid html and css. When they see that you can change the color of their links across the entire site in less than a couple of seconds; when they see that you can apply a background image to all of their pages in less than a minute; you'll have won them over!
Here is another thing. Using CSS relies on having well-formed and valid HTML. [And, I will say it again; I'm not bothered about the sort of HTML that includes a few propietory tags like MARGINHEIGHT, BLINK, MARQUEE, or BGPROPERTIES; I'm talking about correcting code that has tag spelling typos, unquoted attributes, tag nesting errors, unclosed tag pairs, and so on].
Well, I wasn't really defending people who use HTML incorrectly. I agree, that kind of validation is essential, but I'm talking about older code being declared invalid even when it's been used correctly.
In a nutshell:
Is it really necessary to say that "<center>Centred Text</center>" is invalid? Why? What harm does it do to anyone? Search engines don't care, visitors don't, and it renders fine in all browsers as far as I'm aware.
I know they're declared "Deprecated" rather than obsolete, but this does mean they will become obsolete in the future and in theory not supported by future browsers. Why on earth should this happen?
Why should you reduce the number of sites a browser can view? It can't be about bandwidth (text takes up a minute amount of space compared to pictures, sound and video) and it can't put too much strain on browsers either.
I guess all this discussion is irrelevant as what'll happen is that loads of people will continue using any old tags that work with IE and Netscape.
I have to admit though, "Bobby" accessibility is a very serious point in favour of validation, and perhaps that alone would make compliance desirable even if a site doesn't need to be compliant to work on any standard browser. Okay, on that point alone I'm sold 100%.
I guess all this discussion is irrelevant as what'll happen is that loads of people will continue using any old tags that work with IE and Netscape.
I hear you. People will keep on using old tags, but like anything else that's obsolete, fewer and fewer will as time goes on. In the end, it gets to the point where people think you're weird for still using them.
If you look at the early Model T, or the Kitty Hawk's creation, I'm sure if they were in excellent mechanical condition, that would make them both functional. Right?.
Well ask yourself where are they? Perhaps in a museum somewhere in the world. Even if there are duplications of these machinery, I certainly don't see them in use. Not necessarily solely because of their monetary value, because they are antiques, but they would not fit in properly with new standards and codes and all the legal requirements.
No we shouldn't look at them as old, but rather be thank-full for, where they have brought us, and what the future models holds in store, because of being firsts in their respectfull industry.
The same will happen to HTML, CSS, XHTML and all the new developments now on the boards for future recommendation. All things will share their moment of glory then finally fade into history, only to remind us where we were years before.
If you wish to remain behind it's a choice, each of us will have to make. Undoubtedly one will feel left behind if they miss the boat.
Sure changes to the internet will not happen overnight, but eventually they will fade out at some point in time. Don't think for one moment the major players aren't aware of this. The thing is, if browsers can't parse properly what good is it to the user. You think the big boys wont have to change?.
The almighty dollar will force all of them to comply with the standards. Oh, they do have input as far as new ideas goes, but in the end, it's still up to W3C (which is the governing body of the WWW) to accept and implement those which is agreed on, into the overall scheme on how the internet evolve. In the meantime IE and NN are putting the cart in front of the horse, because of the new gimmicks they develop and work in their browsers, Why? because of the competition for those little green one$. :(
Keep on keeping on, fellow websters.
jaybee