homepage Welcome to WebmasterWorld Guest from 54.161.147.106
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / HTML
Forum Library, Charter, Moderators: incrediBILL

HTML Forum

This 44 message thread spans 2 pages: 44 ( [1] 2 > >     
HTML Validation?
How important is it.
coosblues




msg:605337
 3:44 am on Oct 24, 2002 (gmt 0)

Just wondering how or if the tags are looked at by the search engines. I'm sure i've got some bad tags here and there but the page looks and works fine. Am I going to get penalized for something like omitting a tag like </b> etc. Thanks

 

jdMorgan




msg:605338
 3:55 am on Oct 24, 2002 (gmt 0)

Maybe, maybe not - depends on them, not you.

Why take the chance?

Here's a helpful tool [validator.w3.org].

Jim

martinibuster




msg:605339
 4:18 am on Oct 24, 2002 (gmt 0)

I knew someone that had a badly coded site. It was not indexed very well. It was apparent that the spider was giving up or couldn't make sense of the site.

I made these observations by doing varous kinds of searches, and looking at the cache.

The site was indexed better after he cleaned up his site. It was a lesson for him and me.

fathom




msg:605340
 4:32 am on Oct 24, 2002 (gmt 0)

Tags... as in Meta Tags?

The choice is yours.

I've started removing all but robot tags (if needing "noindex or no follow").

Pages seem fine... traffic is up from search engines (but I really don't removing the tags actually had anything to do with this).

If you have lots of content "text" on your pages "meta description", keywords, etc. are probably not helping at all, just extra characters on the web page -- weighing it down.

Recommend - if not 100% confident that they don't help, remove some off -- maybe on some obscure page (that is indexed) and ranked within the top 20 (but not at 1 - 10) on a keyword phrase that receives little usage.

If the listing goes up -- act accordingly.

If the listing goes down -- act accordingly.

If remains the same... likely they don't help but the don't hurt either.

Best to also let this sit for at least 2 updates, if not longer before qualifying any conclusions, remembering that Search Engine don't act immediately on chances we make yesturday.

Brett_Tabke




msg:605341
 6:19 am on Oct 24, 2002 (gmt 0)

Validation or near validation is very important. We just don't know how good or bad se spiders are at working with html.

What we do know is that historically se's have had various troubles with html. "Creative html" used to be one of the top seo tricks for a few years. Double title tags, double meta tags, unclosed tags, and other gimmicks worked well because spiders didn't always parse the html correctly.

Given that history, it is safe to assume se's have become much better at parsing html. There is so much junk code (street html) out there on the web, that the se's have had to become more strict - not less - at parsing and extracting text from html.

That means the only safe bet, is to validate your code to make sure it is visible by a search engine. We se's work up their spiders/indexers they can use many of the same tools and test pages at the w3c to test them as the browser developers do.

Additionally, its just good webmastering to get your site close to valid as possible. Most of us understand a webmaster making bandwidth or style choices that won't affect rendering in a browser (such as skipping alt tags, or unquoted tag attributes), but bare syntax errors need to be fixed.

tedster




msg:605342
 3:52 pm on Oct 24, 2002 (gmt 0)

The issue in many cases is not one of black and white - that is "in or out" of the database. The issue is most often that only PART of the page makes into the search engine's analysis.

If an important keyword phrase gets excluded because it sits inside a syntax error that the spider can't figure out, you've just lost an asset that you probably intended to have. One really deadly, and easy to make, error is a missing ">"

That makes the text that follows this bad tag look like the tag's attribute, rather than page content. The spider "may" be able to recover after a new, well-formed closing tag appears - and then be able to grab the rest of the page. Or, depending on the sophistication of the spider's code, maybe not.

Brett's got it straight - validate your pages. You may intentionally choose to introduce some non-standard code, but you don't want accidents to handicap the site.

g1smd




msg:605343
 7:33 pm on Oct 24, 2002 (gmt 0)

One point to make here, is that many people use the HTML validator at [validator.w3.org...] not to get their code exactly as per the W3C standards but to provide a list of problems such as:

  • Tags with typos like <TBALE> or <IMGG>
  • Missing > or < such as <TD or ...blarg.gif"/A>
  • Unescaped ampersands: & should be &amp;
  • Wrongly nested tags, tags closed in the wrong order
  • Tags opened but not closed
  • Essential elements missing, like having a table with <TD> but no <TR>
  • Block elements wrongly being contained inside Inline elements
  • Missing or wrongly formed META tags.
  • Notices of attibutes without the value in Quotes

    There are a number of extensions to HTML that only work in Netscape, or only work in Infernal Exploder (such as <MARQUEE> ). Using the validator is not a requirement to remove all such proprietory extensions to the HTML coding. Use the validator to help you to write code without logic and structural errors in the code; don't worry too much that you may use a few non-standard tags - these will be ignored in browsers that do not use or require them.

  • lorax




    msg:605344
     7:36 pm on Oct 24, 2002 (gmt 0)

    street html

    heh, did you coin this Brett?

    gibbergibber




    msg:605345
     5:03 pm on Nov 12, 2002 (gmt 0)

    Hi,

    I was going to reply to the Alltheweb validation thread but thought my post was more appropriate here.

    -- One down. Loads to go. Why is this simple thing so difficult? --

    Okay, I'm not in any way a programmer but I found html quite easy to learn and have some websites that I wrote in it myself. I've never bothered with style sheets because they didn't seem to offer anything I needed (my site is very basic and works just fine!) but according to the w3c.org standards if I don't use style sheets lots of the elements of HTML I use instead may be made obsolete in future versions of the HTML specifications.

    For example, using the BODY BGCOLOR attribute to define background colour. Why is that a problem? I can't see how it could be made more efficient. Why does that have to be labelled "Deprecated" and under threat of becoming obsolete?

    In the same way that I don't get why I have to buy a new compter every two years to continue wordprocessing, I don't get why I have to regularly update HTML to publish exactly the same material on the same site.

    I guess what I'm asking is, if I leave my HTML as it is, as the years go by will my site:

    - go lower and lower down search engine results pages

    - get dropped from search engine results

    - be unreadable by one or more browsers

    I'm just not clear on why any of these things should happen, especially browsers not being able to read it. I thought W3C was about making sites accessible rather than making them, well, inaccessible.

    I also don't get why search engines would alter my rankings. Surely they should be judging me on my content alone?

    Can anyone clarify things? I'm sure I've missed something.

    willtell




    msg:605346
     5:22 pm on Nov 12, 2002 (gmt 0)

    We just found out our site had a grey bar from Google. We couldn't find the reason as we have around 20k pages of content and no spam. We've never had complaints about our site, it's just good content. We just found a broken link and believe that it's been this way for a few weeks. That is the only reason we can see for losing our traffic from Google.

    Lots0




    msg:605347
     5:41 pm on Nov 12, 2002 (gmt 0)

    I guess what I'm asking is, if I leave my HTML as it is, as the years go by will my site:

    go lower and lower down search engine results pages
    Most likely you will lose ranking as time goes on. You may not be changing and updating your content but other Webmasters are, as the updated pages move up in SERP yours will of course move down in the SERP.

    get dropped from search engine results
    Most Se’s will keep an inactive page in there index for quite some time. (I was just looking at a page that was last updated in February 1995)

    be unreadable by one or more browsers
    I do believe the trend in browsers is for more inclusion not less, in other words It is in a browsers best interest to be able to ‘read’ as many different web pages as it can, I mean who wants a browser that can only display ½ or even 3/4 of the pages on the Internet, personally I want to see them all.

    Lots0cash

    Lots0




    msg:605348
     5:52 pm on Nov 12, 2002 (gmt 0)

    willtell said,
    we have around 20k pages of content and no spam.
    Twenty thousand pages! How many are cgi pages? Do you use a standard template across all 20 thousand pages? What kind of linking structure are you using?

    Lots0cash

    [edited by: Lots0 at 5:53 pm (utc) on Nov. 12, 2002]

    victor




    msg:605349
     5:52 pm on Nov 12, 2002 (gmt 0)

    Gibbergibber:
    I'm just not clear on why any of these things should happen, especially browsers not being able to read it. I thought W3C was about making sites accessible rather than making them, well, inaccessible.

    If you prefix all your HTML with a recognised!DOCTYPE, you'll be telling any browser (including those of the future) precisely what subset of HMTL your page is using. That's pretty close to future-proof.

    Without a!DOCTYPE, there is a lot of dangerous guessing going on even today -- should a line of code be treated one way or another? Different browsers (and different editions of the same one) guess differently.

    Here's a!DOCTYPE table:

    [gutfeldt.ch...]

    nonprof webguy




    msg:605350
     5:54 pm on Nov 12, 2002 (gmt 0)

    For example, using the BODY BGCOLOR attribute to define background colour. Why is that a problem? I can't see how it could be made more efficient. Why does that have to be labelled "Deprecated" and under threat of becoming obsolete?

    I can't answer why it's been deprecated, but I can tell you it is incredibly useful to place the attributes of a <body> tag in stylesheets rather than HTML. I'm working on serving up different stylesheets based on browser and OS platform, and I have slightly different layouts for two templates. So I've found using unique ids for my body tags, ala <body id="front"> or <body id="inside">, to call the "family" of attributes that are right for the page depending on what area of the site it is on are big time savers. I can make any attribute changes just once in the appropriate stylesheet and not have do a somewhat dangerous search and replace through hundreds of pages. As an aside, thanks to cascading, I can separate the attributes that need to change based on browser and OS into separate style declarations or .css files, minimizing repetition.

    pageoneresults




    msg:605351
     5:55 pm on Nov 12, 2002 (gmt 0)

    I started my trek towards validation a couple of years ago. I haven't looked back since. I can honestly say that the sites I've managed or are managing outperform many others in their industries. They load quickly, they are easy to navigate and they rank respectably.

    I'm a firm believer in giving the indexing spiders what they are there to get, html in its simplest form without errors. If I can get that spider from point a to point z in the least amount of time and valid code, then I've achieved my goal.

    But, if that spider runs into invalid code along the way, it may skip points l, m and n because something was not right. What if my most important information was in points l, m and n. I just missed an opportunity, and in some cases, a large opportunity. Getting the spiders there to begin with is a task in itself. Getting them from point a to point z is the challenge.

    I firmly believe that the future (and now) is valid coding, whether its html, asp, javascript, etc. The web has enough Street HTML as Brett coined it and its causing problems on the Internet. Its now time for responsible developers to clean up their act. No more relying on IE to support the justifications of it works fine in IE. If I'm not mistaken, the IE browser is going through major changes to address this very problem.

    It sure would be nice to see written documentation from authoritative resources on what happens with SE's and invalid code. I'm not too sure if anyone has ever done a side by side comparison of the same page, one with errors, and one without.

    pageoneresults




    msg:605352
     5:59 pm on Nov 12, 2002 (gmt 0)

    P.S. Its important enough that dmoz took the steps to validate...

    Search Engine HTML Validation Results [webmasterworld.com]

    LucyGrrl




    msg:605353
     6:25 pm on Nov 12, 2002 (gmt 0)

    Here is another great free HTML checker tool from Flanders at webpagesthatsuck.com

    [fixingyourwebsite.com...]

    andreasfriedrich




    msg:605354
     6:33 pm on Nov 12, 2002 (gmt 0)

    I don't get why I have to regularly update HTML

    You don´t have to. But you may want to if there are important improvements in HTML. Originally HTML was designed to markup the structure of documents. This was good and within the spirit of SGML.

    Then for various reasons the W3C added elements that were about style rather than structure. This went on for some time until it reached a stage where having structure and style mixed up became to burdensome.

    To achieve a strict separation of content and style CSS was specified. This was good again and within the spirit of SGML.

    If you don´t consider those improvements improvements and you do not think they will make your work easier there is no need to update.

    Andreas

    andreasfriedrich




    msg:605355
     6:46 pm on Nov 12, 2002 (gmt 0)

    great free HTML checker tool

    ...reports errors that are not there.

    I would check my pages using [validator.w3.org...] or better yet install a local version [validator.w3.org] of it.

    Andreas

    2_much




    msg:605356
     7:14 pm on Nov 12, 2002 (gmt 0)

    I guess my question is - is it realistic?

    I've been doing some research for school and for a non-profit that I volunteer with and noticed that the HTML on most of these sites is horrific.

    So given the nature of the web - to share content and find information - is it realistic to expect that everybody who wants to share information has to learn to code properly? Will they have the time/knowledge/ability to?

    andreasfriedrich




    msg:605357
     7:23 pm on Nov 12, 2002 (gmt 0)

    So given the nature of the humans - to communicate with each other - is it realistic to expect that everybody who wants to participate in this communication has to learn to speak/write properly? Will they have the time/knowledge/ability to?

    Probably not everybody. But the more informed and the more intelligent the communication the more you need to know how to speak/writer properly.

    It´s simply a question of whether you really want to bring your point across. If you do, you better communicate in a way that is understood by the people you want to address.

    Andreas

    willtell




    msg:605358
     7:26 pm on Nov 12, 2002 (gmt 0)

    With the amount of software available, you really don't need to be able to write code. Wouldn't it make sense to use software to control your web content? YOu would eliminate most mistakes and be in compliance with the latest versions.

    2_much




    msg:605359
     7:29 pm on Nov 12, 2002 (gmt 0)

    I agree 100%. But I guess my question is a lot more general and not geared towards the "independent web professional".

    My point is, given the MASSIVE scale of the web and the massive amount of people using the web as a means of communication, across countries, languages, economic levels - can/should engines expect people to create good code?

    andreasfriedrich




    msg:605360
     7:36 pm on Nov 12, 2002 (gmt 0)

    Watn dat fürne fraje. Ick soll anders sprechn. Nee, det mach ick nich.

    I believe you do expect me to talk to you in a way that you can understand. The same holds true for search engines.

    Why would factors like languages or economic levels influence your ability to write valid HTML?

    Andreas

    gibbergibber




    msg:605361
     8:18 pm on Nov 12, 2002 (gmt 0)

    Okay, um, this could all get very complicated and go off at tangents so here's a summary of my problem:

    -I use certain old standard HTML on my site, but for the sake of this example let's just talk about BODY BGCOLOR.

    -It has worked fine up until now and I've used it correctly. It's not "street HTML".

    -Style sheets can't make my site any easier to load or navigate because it's a very simple site.

    Two questions that need to be answered:

    1. Why should I learn and use style sheets instead of BODY BGCOLOR? (I hear the argument about browsers and OS platforms, but surely W3C is there to unify standards, not make them proprietary? And how would BGCOLOR vary with platform?)

    2. Why should search engines and browsers prefer my site using style sheets instead of BODY BGCOLOR?

    pageoneresults




    msg:605362
     9:14 pm on Nov 12, 2002 (gmt 0)

    1. Many reasons, including but not limited to the W3C. The bgcolor attribute [w3.org] is classified as a Deprecated Element or Attribute [w3.org] and one that will most likely be supported for quite some time. It doesn't necessarily mean that browsers are not supporting it. It just means there is a newer and better way of declaring background colors.

    2. Tough question! But, think of it this way. If you have 40k of html on your page and 30k of that are presentational elements like bgcolor, font, font-color, font-size, etc., the indexing spiders need to traverse through all that code to get to your content.

    With CSS, you've now taken that 30k of html and have placed it into one external file that is cached on first visit and then referenced by each page that uses that style sheet. Indexing spiders are now presented with pure content that is not surrounded by a bunch of deprecated attributes.

    Let's use Google for example. It is stated that Google will only index up to 101k of html. If 80k of that are html attributes, then you've just missed a perfect opportunity to present the spider with pure content.

    Now, if you switched that around and had 80k of pure content and 21k of html attributes, you've now just presented the spider with much more pure content than it would have gotten when you had 80k of html. Get the picture? ;)

    Many of us combine html attributes and css together to help trim code bloat. Some are extreme and have stripped out all html attributes in favor of css. I haven't gone that far yet, but I'm close. Wouldn't you rather see this...

    <p>Now is the time for all good men and women to come to the aid of their browsers.</p>

    Instead of this...

    <p align="center"><font color="#000000" size="2">Now is the time for all good men and women to come to the aid of their browsers.</font></p>

    Here is an Index of Attributes [w3.org] which shows which have been Deprecated.

    dingman




    msg:605363
     9:18 pm on Nov 12, 2002 (gmt 0)

    In answer to (1), I see at least a few advantages.

    If I decide that charteuse text on a purple background wasn't such a great choice after all, I can set things right in one file, and it takes no further effort to propagate it to all my pages. If I had used attributes for the body color, I'd have to change every page, which is no fun and introduces many more opportunities for typos. (It might even make it too easy. I've got a complicated dynamically-generated site where I put up holiday color schemes on my wife's whim, because it's so easy. If I had to find all the places

    Over several pages, a shared external style sheet could result in notable bandwidth savings. This is probably a small benefit, especially if your styling is simple, but might be noticable on a high-trafic site. I hear people on this board talk about leaving off the quotation marks around their attributes as a bandwidth-saving measure, and I can't imagine that's more effective over multiple page views than CSS.

    As for (2), I don't see any good reason for engines and browsers to prefer HTML that complies with a newer standard over HTML that complies with an older standard. I do see a reason for them to prefer standards-compliant over non-compliant, but that's different. And if you're using, say, HTML 3.2 with elements that are part of that standard, but deprecated or absent from XHTML 1.1, you're still standards-compliant. Deprecation and removal aren't retroactive to previous versions. I'd be inclined to write anything new to be compliant with newer standards, and update sites where that's practical, but there's no requirement to do so.

    There is a trade-off in browser support, of course. A couple years ago, I tried updating my whole personal site to HTML 4.0 Strict with CSS instead of font tags, and then ditched that version because it looked great in the browsers that supported it, but not in most of the browsers I tried. Now I'm coming back to CSS for presentation because broswers have progressed enough that I can actually get the benefit of the standard.

    g1smd




    msg:605364
     9:22 pm on Nov 12, 2002 (gmt 0)

    HTML 4.01 Transitional allows you to use depecated tags, whilst HTML 4.01 Strict does not. However let's review what deprecated actually means (in the HTML spec):

    Deprecated:

    A deprecated element or attribute is one that has been outdated by newer constructs. Deprecated elements are defined in the reference manual in appropriate locations, but are clearly marked as deprecated. Deprecated elements may become obsolete in future versions of HTML.

    User agents should continue to support deprecated elements for reasons of backward compatibility.

    Definitions of elements and attributes clearly indicate which are deprecated.

    This specification includes examples that illustrate how to avoid using deprecated elements. In most cases these depend on user agent support for style sheets. In general, authors should use style sheets to achieve stylistic and formatting effects rather than HTML presentational attributes. HTML presentational attributes have been deprecated when style sheet alternatives exist (see, for example, [CSS1]).

    Obsolete:

    An obsolete element or attribute is one for which there is no guarantee of support by a user agent. Obsolete elements are no longer defined in the specification, but are listed for historical purposes in the changes section of the reference manual.

    More at: [w3.org ]

    So, my belief is that if you add a !DOCTYPE declaration to your HTML files that tells the browser what sort of ML is coming, and you validate the code to that specification, then you will be safe for many years to come.

    pageoneresults




    msg:605365
     9:27 pm on Nov 12, 2002 (gmt 0)

    Wow! Its nice to get three perspectives in such a short period of time. I'm a big fan of supporting standards. I'm also a devout follower of g1smd as he has helped me to understand the specifications a little more. He has also been responsible for me now promoting the proper date and time formats for the future.

    gibbergibber




    msg:605366
     9:52 pm on Nov 12, 2002 (gmt 0)

    -- Now, if you switched that around and had 80k of pure content and 21k of html attributes, you've now just presented the spider with much more pure content than it would have gotten when you had 80k of html. Get the picture? --

    Ah, I didn't realise search engines cared about the HTML code as well. I'd kind of assumed they only looked at the non-code text and generally ignored anything between < and >, at least as far as assessing content is concerned.

    If what you say is true, why *do* search engines catalogue HTML code? Would it make any difference if Google had this in its database:

    Now is the time for all good men and women to come to the aid of their browsers.

    instead of this:

    <p align="center"><font color="#000000" size="2">Now is the time for all good men and women to come to the aid of their browsers.</font></p>

    ?

    I can't see what the complication would be in engines just ignoring tags altogether, whether they're style sheet ones or old style HTML.

    Also, if it's a problem for them, wouldn't it be easier for Google etc to alter their cataloguing practises rather than to expect everyone else to alter their coding?

    If for some reason search engines can't ignore HTML code, then yeah, CSS and the principle of physically separating style code from content code is a good one, but again, only if your site is big and whizzy and full of variation.

    Just to make it clear I can see lots and lots of good uses for CSS, I just couldn't see any reason to make it compulsory if you want to meet W3C standards. It just seemed like removing common words from the dictionary. If people use them, why not leave them in?

    Well, whatever you think of W3C, thank goodness they didn't adopt the <blink> tag. There were some people who'd have entire pages with that on!

    This 44 message thread spans 2 pages: 44 ( [1] 2 > >
    Global Options:
     top home search open messages active posts  
     

    Home / Forums Index / Code, Content, and Presentation / HTML
    rss feed

    All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
    Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
    WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
    © Webmaster World 1996-2014 all rights reserved