Forum Moderators: open
Why take the chance?
Here's a helpful tool [validator.w3.org].
Jim
I made these observations by doing varous kinds of searches, and looking at the cache.
The site was indexed better after he cleaned up his site. It was a lesson for him and me.
The choice is yours.
I've started removing all but robot tags (if needing "noindex or no follow").
Pages seem fine... traffic is up from search engines (but I really don't removing the tags actually had anything to do with this).
If you have lots of content "text" on your pages "meta description", keywords, etc. are probably not helping at all, just extra characters on the web page -- weighing it down.
Recommend - if not 100% confident that they don't help, remove some off -- maybe on some obscure page (that is indexed) and ranked within the top 20 (but not at 1 - 10) on a keyword phrase that receives little usage.
If the listing goes up -- act accordingly.
If the listing goes down -- act accordingly.
If remains the same... likely they don't help but the don't hurt either.
Best to also let this sit for at least 2 updates, if not longer before qualifying any conclusions, remembering that Search Engine don't act immediately on chances we make yesturday.
What we do know is that historically se's have had various troubles with html. "Creative html" used to be one of the top seo tricks for a few years. Double title tags, double meta tags, unclosed tags, and other gimmicks worked well because spiders didn't always parse the html correctly.
Given that history, it is safe to assume se's have become much better at parsing html. There is so much junk code (street html) out there on the web, that the se's have had to become more strict - not less - at parsing and extracting text from html.
That means the only safe bet, is to validate your code to make sure it is visible by a search engine. We se's work up their spiders/indexers they can use many of the same tools and test pages at the w3c to test them as the browser developers do.
Additionally, its just good webmastering to get your site close to valid as possible. Most of us understand a webmaster making bandwidth or style choices that won't affect rendering in a browser (such as skipping alt tags, or unquoted tag attributes), but bare syntax errors need to be fixed.
If an important keyword phrase gets excluded because it sits inside a syntax error that the spider can't figure out, you've just lost an asset that you probably intended to have. One really deadly, and easy to make, error is a missing ">"
That makes the text that follows this bad tag look like the tag's attribute, rather than page content. The spider "may" be able to recover after a new, well-formed closing tag appears - and then be able to grab the rest of the page. Or, depending on the sophistication of the spider's code, maybe not.
Brett's got it straight - validate your pages. You may intentionally choose to introduce some non-standard code, but you don't want accidents to handicap the site.
There are a number of extensions to HTML that only work in Netscape, or only work in Infernal Exploder (such as <MARQUEE> ). Using the validator is not a requirement to remove all such proprietory extensions to the HTML coding. Use the validator to help you to write code without logic and structural errors in the code; don't worry too much that you may use a few non-standard tags - these will be ignored in browsers that do not use or require them.
I was going to reply to the Alltheweb validation thread but thought my post was more appropriate here.
-- One down. Loads to go. Why is this simple thing so difficult? --
Okay, I'm not in any way a programmer but I found html quite easy to learn and have some websites that I wrote in it myself. I've never bothered with style sheets because they didn't seem to offer anything I needed (my site is very basic and works just fine!) but according to the w3c.org standards if I don't use style sheets lots of the elements of HTML I use instead may be made obsolete in future versions of the HTML specifications.
For example, using the BODY BGCOLOR attribute to define background colour. Why is that a problem? I can't see how it could be made more efficient. Why does that have to be labelled "Deprecated" and under threat of becoming obsolete?
In the same way that I don't get why I have to buy a new compter every two years to continue wordprocessing, I don't get why I have to regularly update HTML to publish exactly the same material on the same site.
I guess what I'm asking is, if I leave my HTML as it is, as the years go by will my site:
- go lower and lower down search engine results pages
- get dropped from search engine results
- be unreadable by one or more browsers
I'm just not clear on why any of these things should happen, especially browsers not being able to read it. I thought W3C was about making sites accessible rather than making them, well, inaccessible.
I also don't get why search engines would alter my rankings. Surely they should be judging me on my content alone?
Can anyone clarify things? I'm sure I've missed something.
I guess what I'm asking is, if I leave my HTML as it is, as the years go by will my site:
go lower and lower down search engine results pagesMost likely you will lose ranking as time goes on. You may not be changing and updating your content but other Webmasters are, as the updated pages move up in SERP yours will of course move down in the SERP.
get dropped from search engine resultsMost Se’s will keep an inactive page in there index for quite some time. (I was just looking at a page that was last updated in February 1995)
be unreadable by one or more browsersI do believe the trend in browsers is for more inclusion not less, in other words It is in a browsers best interest to be able to ‘read’ as many different web pages as it can, I mean who wants a browser that can only display ½ or even 3/4 of the pages on the Internet, personally I want to see them all.
Lots0cash
we have around 20k pages of content and no spam.Twenty thousand pages! How many are cgi pages? Do you use a standard template across all 20 thousand pages? What kind of linking structure are you using?
Lots0cash
[edited by: Lots0 at 5:53 pm (utc) on Nov. 12, 2002]
If you prefix all your HTML with a recognised!DOCTYPE, you'll be telling any browser (including those of the future) precisely what subset of HMTL your page is using. That's pretty close to future-proof.
Without a!DOCTYPE, there is a lot of dangerous guessing going on even today -- should a line of code be treated one way or another? Different browsers (and different editions of the same one) guess differently.
Here's a!DOCTYPE table:
[gutfeldt.ch...]
For example, using the BODY BGCOLOR attribute to define background colour. Why is that a problem? I can't see how it could be made more efficient. Why does that have to be labelled "Deprecated" and under threat of becoming obsolete?
I can't answer why it's been deprecated, but I can tell you it is incredibly useful to place the attributes of a <body> tag in stylesheets rather than HTML. I'm working on serving up different stylesheets based on browser and OS platform, and I have slightly different layouts for two templates. So I've found using unique ids for my body tags, ala <body id="front"> or <body id="inside">, to call the "family" of attributes that are right for the page depending on what area of the site it is on are big time savers. I can make any attribute changes just once in the appropriate stylesheet and not have do a somewhat dangerous search and replace through hundreds of pages. As an aside, thanks to cascading, I can separate the attributes that need to change based on browser and OS into separate style declarations or .css files, minimizing repetition.
I'm a firm believer in giving the indexing spiders what they are there to get, html in its simplest form without errors. If I can get that spider from point a to point z in the least amount of time and valid code, then I've achieved my goal.
But, if that spider runs into invalid code along the way, it may skip points l, m and n because something was not right. What if my most important information was in points l, m and n. I just missed an opportunity, and in some cases, a large opportunity. Getting the spiders there to begin with is a task in itself. Getting them from point a to point z is the challenge.
I firmly believe that the future (and now) is valid coding, whether its html, asp, javascript, etc. The web has enough Street HTML as Brett coined it and its causing problems on the Internet. Its now time for responsible developers to clean up their act. No more relying on IE to support the justifications of it works fine in IE. If I'm not mistaken, the IE browser is going through major changes to address this very problem.
It sure would be nice to see written documentation from authoritative resources on what happens with SE's and invalid code. I'm not too sure if anyone has ever done a side by side comparison of the same page, one with errors, and one without.
Search Engine HTML Validation Results [webmasterworld.com]
I don't get why I have to regularly update HTML
You don´t have to. But you may want to if there are important improvements in HTML. Originally HTML was designed to markup the structure of documents. This was good and within the spirit of SGML.
Then for various reasons the W3C added elements that were about style rather than structure. This went on for some time until it reached a stage where having structure and style mixed up became to burdensome.
To achieve a strict separation of content and style CSS was specified. This was good again and within the spirit of SGML.
If you don´t consider those improvements improvements and you do not think they will make your work easier there is no need to update.
Andreas
great free HTML checker tool
...reports errors that are not there.
I would check my pages using [validator.w3.org...] or better yet install a local version [validator.w3.org] of it.
Andreas
I've been doing some research for school and for a non-profit that I volunteer with and noticed that the HTML on most of these sites is horrific.
So given the nature of the web - to share content and find information - is it realistic to expect that everybody who wants to share information has to learn to code properly? Will they have the time/knowledge/ability to?
Probably not everybody. But the more informed and the more intelligent the communication the more you need to know how to speak/writer properly.
It´s simply a question of whether you really want to bring your point across. If you do, you better communicate in a way that is understood by the people you want to address.
Andreas
My point is, given the MASSIVE scale of the web and the massive amount of people using the web as a means of communication, across countries, languages, economic levels - can/should engines expect people to create good code?
-I use certain old standard HTML on my site, but for the sake of this example let's just talk about BODY BGCOLOR.
-It has worked fine up until now and I've used it correctly. It's not "street HTML".
-Style sheets can't make my site any easier to load or navigate because it's a very simple site.
Two questions that need to be answered:
1. Why should I learn and use style sheets instead of BODY BGCOLOR? (I hear the argument about browsers and OS platforms, but surely W3C is there to unify standards, not make them proprietary? And how would BGCOLOR vary with platform?)
2. Why should search engines and browsers prefer my site using style sheets instead of BODY BGCOLOR?
2. Tough question! But, think of it this way. If you have 40k of html on your page and 30k of that are presentational elements like bgcolor, font, font-color, font-size, etc., the indexing spiders need to traverse through all that code to get to your content.
With CSS, you've now taken that 30k of html and have placed it into one external file that is cached on first visit and then referenced by each page that uses that style sheet. Indexing spiders are now presented with pure content that is not surrounded by a bunch of deprecated attributes.
Let's use Google for example. It is stated that Google will only index up to 101k of html. If 80k of that are html attributes, then you've just missed a perfect opportunity to present the spider with pure content.
Now, if you switched that around and had 80k of pure content and 21k of html attributes, you've now just presented the spider with much more pure content than it would have gotten when you had 80k of html. Get the picture? ;)
Many of us combine html attributes and css together to help trim code bloat. Some are extreme and have stripped out all html attributes in favor of css. I haven't gone that far yet, but I'm close. Wouldn't you rather see this...
<p>Now is the time for all good men and women to come to the aid of their browsers.</p>
Instead of this...
<p align="center"><font color="#000000" size="2">Now is the time for all good men and women to come to the aid of their browsers.</font></p>
Here is an Index of Attributes [w3.org] which shows which have been Deprecated.
If I decide that charteuse text on a purple background wasn't such a great choice after all, I can set things right in one file, and it takes no further effort to propagate it to all my pages. If I had used attributes for the body color, I'd have to change every page, which is no fun and introduces many more opportunities for typos. (It might even make it too easy. I've got a complicated dynamically-generated site where I put up holiday color schemes on my wife's whim, because it's so easy. If I had to find all the places
Over several pages, a shared external style sheet could result in notable bandwidth savings. This is probably a small benefit, especially if your styling is simple, but might be noticable on a high-trafic site. I hear people on this board talk about leaving off the quotation marks around their attributes as a bandwidth-saving measure, and I can't imagine that's more effective over multiple page views than CSS.
As for (2), I don't see any good reason for engines and browsers to prefer HTML that complies with a newer standard over HTML that complies with an older standard. I do see a reason for them to prefer standards-compliant over non-compliant, but that's different. And if you're using, say, HTML 3.2 with elements that are part of that standard, but deprecated or absent from XHTML 1.1, you're still standards-compliant. Deprecation and removal aren't retroactive to previous versions. I'd be inclined to write anything new to be compliant with newer standards, and update sites where that's practical, but there's no requirement to do so.
There is a trade-off in browser support, of course. A couple years ago, I tried updating my whole personal site to HTML 4.0 Strict with CSS instead of font tags, and then ditched that version because it looked great in the browsers that supported it, but not in most of the browsers I tried. Now I'm coming back to CSS for presentation because broswers have progressed enough that I can actually get the benefit of the standard.
Deprecated:
A deprecated element or attribute is one that has been outdated by newer constructs. Deprecated elements are defined in the reference manual in appropriate locations, but are clearly marked as deprecated. Deprecated elements may become obsolete in future versions of HTML.
User agents should continue to support deprecated elements for reasons of backward compatibility.
Definitions of elements and attributes clearly indicate which are deprecated.
This specification includes examples that illustrate how to avoid using deprecated elements. In most cases these depend on user agent support for style sheets. In general, authors should use style sheets to achieve stylistic and formatting effects rather than HTML presentational attributes. HTML presentational attributes have been deprecated when style sheet alternatives exist (see, for example, [CSS1]).
Obsolete:
An obsolete element or attribute is one for which there is no guarantee of support by a user agent. Obsolete elements are no longer defined in the specification, but are listed for historical purposes in the changes section of the reference manual.
More at: [w3.org ]
So, my belief is that if you add a !DOCTYPE declaration to your HTML files that tells the browser what sort of ML is coming, and you validate the code to that specification, then you will be safe for many years to come.
Ah, I didn't realise search engines cared about the HTML code as well. I'd kind of assumed they only looked at the non-code text and generally ignored anything between < and >, at least as far as assessing content is concerned.
If what you say is true, why *do* search engines catalogue HTML code? Would it make any difference if Google had this in its database:
Now is the time for all good men and women to come to the aid of their browsers.
instead of this:
<p align="center"><font color="#000000" size="2">Now is the time for all good men and women to come to the aid of their browsers.</font></p>
?
I can't see what the complication would be in engines just ignoring tags altogether, whether they're style sheet ones or old style HTML.
Also, if it's a problem for them, wouldn't it be easier for Google etc to alter their cataloguing practises rather than to expect everyone else to alter their coding?
If for some reason search engines can't ignore HTML code, then yeah, CSS and the principle of physically separating style code from content code is a good one, but again, only if your site is big and whizzy and full of variation.
Just to make it clear I can see lots and lots of good uses for CSS, I just couldn't see any reason to make it compulsory if you want to meet W3C standards. It just seemed like removing common words from the dictionary. If people use them, why not leave them in?
Well, whatever you think of W3C, thank goodness they didn't adopt the <blink> tag. There were some people who'd have entire pages with that on!