|How Many Characters Count in HTML?|
Spiders Count How Many Characters, Words
| 6:08 pm on Aug 22, 2005 (gmt 0)|
How many characters (or words) do spiders look at in HTML code? I have heard the most important spots are at the beginning and end of the HTML that is the most significant. I have also heard there should be between 200-800 words in total. Any more specifics on the most important parts? Where does the spider start? For instance, would <HTML> count as 6 characters?
| 1:58 am on Aug 23, 2005 (gmt 0)|
Perhaps you're thinking of the <meta> tag for Keyword Content, which is limited to 250 words i believe. That's a max of 250 keywords you can proactively use to tell Google "this is what my site is about". But if you have 3,000 words in your page, and one of them is "doodledorf", and I search for that word, Google will show your site as a result.
| 2:15 am on Aug 23, 2005 (gmt 0)|
I've heard that the amount of human-readable content on a page should be around the limits you specified. If that's what you're talking about, the number of words you need to be counting are the ones that are actually visible on the front-end of the page, through the browser. If I were keeping track of such things, I'd exclude navigation from the count. I'd also try to put all navigation and such as far down in the source code as possible with ALL other page content above it.
Frankly, though, I don't consider page length that big of an issue. Minimize HTML markup as much as possible by eliminating code bloat, write clean, validating HTML, come up with a good linking structure, use CSS to lay out your page with the "meaty" content coming first in the source, use a <title> that's laser-targetted to the page content, and keep the content itself focused and relevant. Those are the main things.
| 2:23 am on Aug 23, 2005 (gmt 0)|
Some of the big search engines used to stop reading a page after 101kb, but now they read the whole page. Database space is cheap these days.
| 3:26 am on Aug 23, 2005 (gmt 0)|
Yes, I agree with PM. What you've heard, Webdude, is a bit out of date - mostly from the days when search engines were focused strongly on matching the on-page text to search queries. Things have shifted a lot today, and the simpler formulas of days gone by are fading from usefulness.
The spider takes in the whole html document unless it's really huge. Once the search engine begins processing, it may discount things like a few attributes here and there, but essentially everything gets processed and stored for further number crunching as the algorithms try to determine relevance for different queries.
| 2:30 pm on Aug 23, 2005 (gmt 0)|
Thanks everybody for your feedback and bringing me up to speed! ;-)