Forum Moderators: open
First, the text to HTML ratio on a site may be affected by the specifics of the HTML - is it a template page with identical navigation on every page, or is each page individual (probably somewhere between!); this matters when looking at the ratio in terms of possible duplicate content issues.
But it's also simpler, in that 'common sense' will usually tell you if there's enough 'text' / unique content on a page.
If there's one para and a 'more' link, then the visitor is being driven through multiple clicks to read a couple of hundred words. Why would they put up with that?
On the other hand, if a page has 2000 words in a three inch column, chances are that the visitor is waiting four hours for the page to load, then scrolling forever to read it. Why would they put up with that?
Get the balance right, and you'll be unlikely to need to worry about the multitude of variables that sets the algo's rules on text:HTML.
There may be 'borderline cases' where there is a real worry - but in my experience, it's only at the silly edges that people get problems; most often those using content management systems, where they don't have to look at the actual page they are building! :)
It's much more complicated, and much simpler, than you suggest ;)
I knew if I put the right "bait" out there I'd get some good solid feedback. ;)
First, the text to HTML ratio on a site may be affected by the specifics of the HTML, this matters when looking at the ratio in terms of possible duplicate content issues.
Interesting observation. So this statistic would help in defining what percentage of the HTML is duplicate. What would you say the average numbers might be in regards to the percentage of HTML vs Text? 80%? 85%? More? Less?
Get the balance right, and you'll be unlikely to need to worry about the multitude of variables that sets the algo's rules on text:HTML.
Is there a balance? By default you are going to have a certain percentage of HTML. Site design and architecture are going to play a "major role" in that percentage number. From my testing to date, I'm finding that most sites have less than 15% text. Heck, the Yahoo! home page is at 2.9% text.
I've done enough of these Text to HTML Ratio calculations to see certain patterns. And, what I'm seeing is those with higher TtHR's "appear" to be performing above those with lower percentages. Maybe it's me "Tin Hat", ting, ting. And, I'm not only talking about SERPs performance, I'm referring to load times, above the fold display, etc.
I've done enough of these Text to HTML Ratio calculations to see certain patterns. And, what I'm seeing is those with higher TtHR's "appear" to be performing above those with lower percentages. Maybe it's me "Tin Hat", ting, ting. And, I'm not only talking about SERPs performance, I'm referring to load times, above the fold display, etc.
While I am sure you are right, generally speaking, and my experience (without calculations) supports this, I'd be very wary of putting any figures on it.
I mentioned a couple a variables; you added a few more ... and I'm sure even after others have topped them up, there'll be some that we are all unaware of.
My experience also says that common sense trumps calculations when the variables cannot all be accounted for