Forum Moderators: open
Basically what these folks said. :) The only data point I'd add is Eric Brewer's '96 paper that mentioned 40% of pages have actual errors in the pages.
Note that he is referring to a '96 paper (last century ;) ).
I was unable to find that paper in Papers by Prof. Brewer [cs.berkeley.edu].
I find it a fascinating paper from the days when the web was young. The analysis of errros does not go very far.
Read it and enjoy the nostalgia :)
Mr. Parnas did not indicate the average number of errors per page that I can see from a brief reading of section 5. I would be interested to find that out.
I plan on reading through the whole thesis as I have the time. Looks like a very interesting piece. Nice resource zaptd. :)
Jordan
Just a warning for those who have a dial-up connection (as I do). The thesis is 125 pages of PDF :( I got Chapter 5, "Statistics on syntactical errors" over the connection slowly, well worth the wait.
Does anyone know whether a shorter version (such as a published paper) exists? A quick search failed to find one.
Huh. Not worried at all about that. Those gonifs couldn't comply with an ANSI sheet-metal-screw standard even AFTER they stole a thread-cutting device.
While we're fantasizing, suppose M$ launched a 100% compliant only web page development tool! It would be hard on some people -- the page-view-based advertising revenue for forums such as this would dry up faster than a SCO executive in the witness box.
Ive been looking for a while - does anyone have statistics on how many pages on the web right now would be 'invalid HTML/XHTML'?
It's sad to admit, but right now 100% of the pages on my site are invalid HTML. 99% of those errors are missing tags: <p> without a </p>, of the rest I have deprecated attributes and not a single DTD. I'm fixing what I can as I work thru my keyword list for those pages. Each page gets a full visual check and missing tags are replaced, including the ever missing </html>. It's hard to believe we even get any orders.
It's sad to admit, but right now 100% of the pages on my site are invalid HTML.
Woohoo! I'm down to 98%. I just validated my first page and learned some CSS along the way.
So I show the boss, he smiles, then I ask him if Mr. Widget, the fellow who has maintained this site for the last 3 years - and is currently coding a new site for us - will bother to validate the new pages before they go online. No answer.
As a web designer/developer, building sites (which is something else than optimizing them) a validation is the only accurate measure of how well you do that task; it's a simple yes/no question and you either pass or you don't - meaning: you either do your job or you don't. It should really be adopted by more people in this industry/trade.
All other measures, say, "optimized for browser xyz" is never even what they seem, as "browser xyz" is just not browser xyz when it's the japanese, greek, english, german version, or build 1.1.1.2 vs. 1.1.1.3, or on a small screen vs a large screen, or on one operating system vs. anoter, or having this or that plugin, or whatever settings, or... <cut> this list could continue endlessly.
/claus
Knowing that the sample is from DMOZ listings, it would surprise me if a truly random sample drawn from the whole web would reach a higher percentage of valid documents. I have not been able to find more recent figures, but i suppose it is possible to replicate the study given sufficient programming skills, as both the data source and the validator is publicly available.
/claus
... usually means that the code is still non-valid tag soup.
>> Knowing that the sample is from DMOZ listings <<
If a site fails to work in the reviewing editor's browser, then it gets left in unreviewed with a note. If a sufficient number of people cannot access the site, then it gets deleted, so the ODP maybe has a very very small bias away from sites with very bad coding.