Welcome to WebmasterWorld Guest from 23.21.38.201

Forum Moderators: not2easy

Another way to generate content - convert Word DOC to HTML?

Another way to generate content - convert Word DOC to HTML?

   
5:37 pm on Feb 27, 2008 (gmt 0)

10+ Year Member



Recently, i've been trying to generate more content and cross linking for my product website. I got a few word documents (help guides) with table of contents and tiered-headings, and i'm now toying with the idea of somehow converting the documents into HTML minisite that i can add to my existing website.

Has anyone come across of a software or script that can convert a word doc into SEO friendly minisite? I did some searching on the internet but most doc2html converters are rudimentary.

Thanks.

7:10 pm on Feb 27, 2008 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



I occasionally get Word docs. from contributors.

I open them in Word, save them as HTML.

Then I open them in DreamWeaver, which has a menu item you can select that cleans up the Word HTML (which is crap)--and it's configurable in a number of ways.

Once I've done that I have clean code that I can use in a template.

6:05 pm on Feb 28, 2008 (gmt 0)

5+ Year Member



I have a bunch of Word documents with notes that I created, all preformatted. This might be interesting to try.

I wonder though how to handle equations. Most of my docs are my academic notes, and they have all sorts of equations. It will be interesting to see how Dreamweaver handles those.

3:24 am on Feb 29, 2008 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



If Word converts the symbols correctly when it converts to HTML (and if there are HTML character codes for all of them), then you should be OK from there. Let us know what happens!
3:13 am on Mar 22, 2008 (gmt 0)

5+ Year Member



Word sucks! I have found it is more trouble in most cases to save .Doc files as .html files and then clean them up than it is to just copy and past the contents into a blank page and add html code to it.
6:20 pm on Mar 22, 2008 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



True. Depends on the type and amount of formatting--but the converting approach is worth trying first, before you fall back on the copy and paste technique.
7:40 pm on Mar 22, 2008 (gmt 0)

5+ Year Member



I agree...

But... for a quick workaround, you could try OpenOffice

I just ran a very quick experiment using OOo writer to save (as HTML) an old and rather complicated, 18-page word.doc file that had a variety of headers, regular paragraphs, tables nested within tables, bulleted lists and images

It failed the w3c validator (as HTML 4.0 Transitional) for only FOUR reasons:

  1. The STYLE tag was missing type="text/css"
  2. All IMGs were missing alt attributes (However, as OOo assigned a NAME="theFileName" attribute to each image, this was v easy to fix manually)
  3. All ULs tags nested within other ULs weren't preceded by an LI tag
  4. All TABLEs were declared with an (invalid) attribute of "BORDERCOLOR"

Not bad...

Not perfect either... so... I still agree with omegaman66 above

8:53 pm on Mar 22, 2008 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



If you do convert such documents (especially if you use any Microsoft product to do it) it is imperitive that you clean up the markup.
 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month