homepage Welcome to WebmasterWorld Guest from 54.198.42.105
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / WebmasterWorld / New To Web Development
Forum Library, Charter, Moderators: brotherhood of lan & mack

New To Web Development Forum

    
Keyword density question
Why does all my code show up in keyword density?
zipp06




msg:3869543
 10:15 am on Mar 13, 2009 (gmt 0)

Hoping someone can help me out...I am trying to figure out why our web page isn't showing up in organic search. Even though we have a page rank of 4, we still don't come up. I realize this SEO stuff is really complicated, but I have been using SEO Quake to try and figure out what the problen is. So I check the keyword density and all of the code strings are showing up as keywords. For instance, all of the words below and more...1919 in all!

normal false none style definitions table msonormaltable mso name tstyle rowband size colband noshow priority qformat parent padding alt 0in 4pt para margin top right bottom 0pt left line height 115 pagination widow orphan font family calibri sans serif ascii theme minor latin fareast times new roman hansi microsoftinternetexplorer4

How do I prevent google from viewing these as keywords? Our competitor's site only shows 146

 

rocknbil




msg:3869808
 5:03 pm on Mar 13, 2009 (gmt 0)

Welcome aboard zipp06.

Those are all proprietary "codes" generated from (I'm presuming) a Microsoft Office application such as Word. If you were to view source of your code, you would see it's clogged up with anywhere from 50% to 75% "code" compared to content. So basically, the search engines do not understand these "codes" as they are outside the normal set of elements for HTML or even XHTML, and are indexing them as page content.

The solution? You probably won't like it . . .

Don't use MS office to create your pages. Even Dreamweaver will create a "cleaner" version of your site with no apparent "visual" difference. This will expose the real keywords and content of your site to the search engines.

willybfriendly




msg:3869845
 5:59 pm on Mar 13, 2009 (gmt 0)

Avoiding Word, Outlook, etc. is certainly the preferred method.

Sometimes it is unavoidable. I recently had to deal with this very issue and came up with a php solution.

[webmasterworld.com...]
[webmasterworld.com...]

function cleanUpHTML($text)
{
//discard unwanted tags
$text = strip_tags($text, '<p><b><i><ol><ul><li>');
//strip header stuff
$text = stristr($text, '<P');
//strip all attributes (Word garbage)
$text = preg_replace("/<(\w)[^>]*?>/s", "<$1>", $text);
//get rid of useless non breaking spaces
$text = preg_replace("/&nbsp;/", "", $text);
//get rid of empty p's
$text = preg_replace("/<p><\/p>/i", "", $text);
$text = mb_convert_encoding($text, "EUCJP-WIN", "UTF-8");
return $text;
}

The above function was specific to my needs and will probably need tweaking for other applications, but it is working admirably for its intended task.

zipp06




msg:3869866
 6:23 pm on Mar 13, 2009 (gmt 0)

well, all of that is greek to me. I guess I have a lot to learn. I am using image cafe' (network solutions) I guess where I am typing out my text in word and the pasting into the webpage is where the problem stems? What if I just type the text diectly into the editor in image cafe'.. do you think that would solve it? It's just that little editor window is so small, that's why I copy & paste from word.

tangor




msg:3869868
 6:29 pm on Mar 13, 2009 (gmt 0)

Use NOTEPAD!

Or save the Word doc as PLAIN TXT then cut and paste...FROM NOTEPAD or even save as SIMPLE HTML (look for it) load it locally in your browser then cut and paste from there. Pasting directly from Word is where the problem lies.

zipp06




msg:3869900
 6:49 pm on Mar 13, 2009 (gmt 0)

thanks a bunch! I will fix it on Monday and let you know how I do. Have a peaceful and relaxing weekend!

willybfriendly




msg:3870005
 9:26 pm on Mar 13, 2009 (gmt 0)

or even save as SIMPLE HTML...

Do you mean "filtered" HTML, which purportedly removes word specific tags?

Well, it doesn't. Here is an example:

<p class=MsoNormal>The quick <b>brown</b> <i>fox</i> <u>jumps</u> over the lazy
dog…</p>

<ol style='margin-top:0in' start=1 type=1>
<li class=MsoNormal>once</li>
<li class=MsoNormal>twice </li>
<li class=MsoNormal>thrice</li>
</ol>

<p class=MsoNormal>&nbsp;</p>

<ol style='margin-top:0in' start=3 type=1>
<ul style='margin-top:0in' type=disc>
<li class=MsoNormal>amazing what we can do here</li>
</ul>
</ol>

Problems to be seen in the above:

1. Declaration of the class MsoNormal (multiple times), sans quotes, and quite likely not in your style sheet.
2. Failure to properly nest the unordered list within the ordered list.
3. Unwanted in-line styling
4. Empty <p> tags - OK, not empty, but a spurious &nbsp; in there

Just what is that class MsoNormal that we can't seem to be able to get away from?

margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman";

Save it as plain text? Now you lose all formatting. This:

<ol>
<li>A list item</li>
</ol>

becomes this:

1. A list item

and this:

<ul>
<li>A list item</li>
</ul>

becomes this:

* A list item

But, if you cut and paste from Word into note pad, that UL list item brings with it a disc:

•A list item

None of the above is correct markup. If one believes that correct sematic markup influences rankings (I do) then Word should be avoided at all costs.

It is absolutely crazy making. I am associated with a site that has several content contributors - most of which simply will not move off of Word for their content creation, no matter how easy I make it for them.

They cut and paste, then email me because the page is broken. I end up having to go in and remove all the span, font and spurious css garbage, reformat lists, headings, etc with proper markup, etc.

But Word's ubiquitousness and "ease of use" prevails no matter how I approach the problem...

Thus that bit of php code I shared, since it automates 90% of the cleanup.

tangor




msg:3870051
 10:49 pm on Mar 13, 2009 (gmt 0)

This is true. But the FILTERED (thanks for reminding me) is NOT the formatting code from native Word docs and won't cause the same kind of problems OP initially noted.

zipp06




msg:3871263
 10:01 am on Mar 16, 2009 (gmt 0)

I just decided to look at one of my other sites...lo and behold, I have the same problem. I realize I am a rookie, but I am willing to learn. Is it possible to use a WYSIWYG editor and NOT get this problem? I don't want to spend hours trying to fix this if the solution lies in the editor. On this website, I am using Plesk Sitebuilder website builder. I just want to show up in search eventually, but I don't want to THINK I am doing everything correctly only to find out that all of those HTML words are counting as keywords. Also, I noticed that there is no "Robots.txt" in this site. How to you do that if you are using a site builder? Should I just get away from the site builders altogether? I just want to stop wasting my time and get this done. The site I am referring to is [snip]

[edited by: brotherhood_of_LAN at 10:30 am (utc) on Mar. 16, 2009]
[edit reason] No personal URLs as per the ToS, use generics. Thanks. [/edit]

zipp06




msg:3871334
 12:35 pm on Mar 16, 2009 (gmt 0)

just wanted to say Thanks! I used the process of elimination and found where all those pesky tags were coming from. I took your advice and pasted the copy into notepad, then re-pasted it into my site and it worked like a charm! Sorry about putting the link in the last post, I was not aware... You guys are great!

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / New To Web Development
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved