Welcome to WebmasterWorld Guest from

Forum Moderators: brotherhood of lan & mack

Message Too Old, No Replies

Keyword density question

Why does all my code show up in keyword density?

10:15 am on Mar 13, 2009 (gmt 0)

5+ Year Member

Hoping someone can help me out...I am trying to figure out why our web page isn't showing up in organic search. Even though we have a page rank of 4, we still don't come up. I realize this SEO stuff is really complicated, but I have been using SEO Quake to try and figure out what the problen is. So I check the keyword density and all of the code strings are showing up as keywords. For instance, all of the words below and more...1919 in all!

normal false none style definitions table msonormaltable mso name tstyle rowband size colband noshow priority qformat parent padding alt 0in 4pt para margin top right bottom 0pt left line height 115 pagination widow orphan font family calibri sans serif ascii theme minor latin fareast times new roman hansi microsoftinternetexplorer4

How do I prevent google from viewing these as keywords? Our competitor's site only shows 146

5:03 pm on Mar 13, 2009 (gmt 0)

WebmasterWorld Senior Member rocknbil is a WebmasterWorld Top Contributor of All Time 10+ Year Member

Welcome aboard zipp06.

Those are all proprietary "codes" generated from (I'm presuming) a Microsoft Office application such as Word. If you were to view source of your code, you would see it's clogged up with anywhere from 50% to 75% "code" compared to content. So basically, the search engines do not understand these "codes" as they are outside the normal set of elements for HTML or even XHTML, and are indexing them as page content.

The solution? You probably won't like it . . .

Don't use MS office to create your pages. Even Dreamweaver will create a "cleaner" version of your site with no apparent "visual" difference. This will expose the real keywords and content of your site to the search engines.

5:59 pm on Mar 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

Avoiding Word, Outlook, etc. is certainly the preferred method.

Sometimes it is unavoidable. I recently had to deal with this very issue and came up with a php solution.


function cleanUpHTML($text)
//discard unwanted tags
$text = strip_tags($text, '<p><b><i><ol><ul><li>');
//strip header stuff
$text = stristr($text, '<P');
//strip all attributes (Word garbage)
$text = preg_replace("/<(\w)[^>]*?>/s", "<$1>", $text);
//get rid of useless non breaking spaces
$text = preg_replace("/&nbsp;/", "", $text);
//get rid of empty p's
$text = preg_replace("/<p><\/p>/i", "", $text);
$text = mb_convert_encoding($text, "EUCJP-WIN", "UTF-8");
return $text;

The above function was specific to my needs and will probably need tweaking for other applications, but it is working admirably for its intended task.

6:23 pm on Mar 13, 2009 (gmt 0)

5+ Year Member

well, all of that is greek to me. I guess I have a lot to learn. I am using image cafe' (network solutions) I guess where I am typing out my text in word and the pasting into the webpage is where the problem stems? What if I just type the text diectly into the editor in image cafe'.. do you think that would solve it? It's just that little editor window is so small, that's why I copy & paste from word.
6:29 pm on Mar 13, 2009 (gmt 0)

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month


Or save the Word doc as PLAIN TXT then cut and paste...FROM NOTEPAD or even save as SIMPLE HTML (look for it) load it locally in your browser then cut and paste from there. Pasting directly from Word is where the problem lies.

6:49 pm on Mar 13, 2009 (gmt 0)

5+ Year Member

thanks a bunch! I will fix it on Monday and let you know how I do. Have a peaceful and relaxing weekend!
9:26 pm on Mar 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

or even save as SIMPLE HTML...

Do you mean "filtered" HTML, which purportedly removes word specific tags?

Well, it doesn't. Here is an example:

<p class=MsoNormal>The quick <b>brown</b> <i>fox</i> <u>jumps</u> over the lazy

<ol style='margin-top:0in' start=1 type=1>
<li class=MsoNormal>once</li>
<li class=MsoNormal>twice </li>
<li class=MsoNormal>thrice</li>

<p class=MsoNormal>&nbsp;</p>

<ol style='margin-top:0in' start=3 type=1>
<ul style='margin-top:0in' type=disc>
<li class=MsoNormal>amazing what we can do here</li>

Problems to be seen in the above:

1. Declaration of the class MsoNormal (multiple times), sans quotes, and quite likely not in your style sheet.
2. Failure to properly nest the unordered list within the ordered list.
3. Unwanted in-line styling
4. Empty <p> tags - OK, not empty, but a spurious &nbsp; in there

Just what is that class MsoNormal that we can't seem to be able to get away from?

font-family:"Times New Roman";

Save it as plain text? Now you lose all formatting. This:

<li>A list item</li>

becomes this:

1. A list item

and this:

<li>A list item</li>

becomes this:

* A list item

But, if you cut and paste from Word into note pad, that UL list item brings with it a disc:

•A list item

None of the above is correct markup. If one believes that correct sematic markup influences rankings (I do) then Word should be avoided at all costs.

It is absolutely crazy making. I am associated with a site that has several content contributors - most of which simply will not move off of Word for their content creation, no matter how easy I make it for them.

They cut and paste, then email me because the page is broken. I end up having to go in and remove all the span, font and spurious css garbage, reformat lists, headings, etc with proper markup, etc.

But Word's ubiquitousness and "ease of use" prevails no matter how I approach the problem...

Thus that bit of php code I shared, since it automates 90% of the cleanup.

10:49 pm on Mar 13, 2009 (gmt 0)

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

This is true. But the FILTERED (thanks for reminding me) is NOT the formatting code from native Word docs and won't cause the same kind of problems OP initially noted.
10:01 am on Mar 16, 2009 (gmt 0)

5+ Year Member

I just decided to look at one of my other sites...lo and behold, I have the same problem. I realize I am a rookie, but I am willing to learn. Is it possible to use a WYSIWYG editor and NOT get this problem? I don't want to spend hours trying to fix this if the solution lies in the editor. On this website, I am using Plesk Sitebuilder website builder. I just want to show up in search eventually, but I don't want to THINK I am doing everything correctly only to find out that all of those HTML words are counting as keywords. Also, I noticed that there is no "Robots.txt" in this site. How to you do that if you are using a site builder? Should I just get away from the site builders altogether? I just want to stop wasting my time and get this done. The site I am referring to is [snip]

[edited by: brotherhood_of_LAN at 10:30 am (utc) on Mar. 16, 2009]
[edit reason] No personal URLs as per the ToS, use generics. Thanks. [/edit]

12:35 pm on Mar 16, 2009 (gmt 0)

5+ Year Member

just wanted to say Thanks! I used the process of elimination and found where all those pesky tags were coming from. I took your advice and pasted the copy into notepad, then re-pasted it into my site and it worked like a charm! Sorry about putting the link in the last post, I was not aware... You guys are great!