homepage Welcome to WebmasterWorld Guest from 54.234.59.94
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / WebmasterWorld / Content, Writing and Copyright
Forum Library, Charter, Moderators: not2easy

Content, Writing and Copyright Forum

    
Another way to generate content - convert Word DOC to HTML?
Another way to generate content - convert Word DOC to HTML?
touchring




msg:3586260
 5:37 pm on Feb 27, 2008 (gmt 0)

Recently, i've been trying to generate more content and cross linking for my product website. I got a few word documents (help guides) with table of contents and tiered-headings, and i'm now toying with the idea of somehow converting the documents into HTML minisite that i can add to my existing website.

Has anyone come across of a software or script that can convert a word doc into SEO friendly minisite? I did some searching on the internet but most doc2html converters are rudimentary.

Thanks.

 

purplecape




msg:3586331
 7:10 pm on Feb 27, 2008 (gmt 0)

I occasionally get Word docs. from contributors.

I open them in Word, save them as HTML.

Then I open them in DreamWeaver, which has a menu item you can select that cleans up the Word HTML (which is crap)--and it's configurable in a number of ways.

Once I've done that I have clean code that I can use in a template.

einsteinsboi




msg:3587334
 6:05 pm on Feb 28, 2008 (gmt 0)

I have a bunch of Word documents with notes that I created, all preformatted. This might be interesting to try.

I wonder though how to handle equations. Most of my docs are my academic notes, and they have all sorts of equations. It will be interesting to see how Dreamweaver handles those.

purplecape




msg:3587743
 3:24 am on Feb 29, 2008 (gmt 0)

If Word converts the symbols correctly when it converts to HTML (and if there are HTML character codes for all of them), then you should be OK from there. Let us know what happens!

omegaman66




msg:3607795
 3:13 am on Mar 22, 2008 (gmt 0)

Word sucks! I have found it is more trouble in most cases to save .Doc files as .html files and then clean them up than it is to just copy and past the contents into a blank page and add html code to it.

purplecape




msg:3608136
 6:20 pm on Mar 22, 2008 (gmt 0)

True. Depends on the type and amount of formatting--but the converting approach is worth trying first, before you fall back on the copy and paste technique.

lavazza




msg:3608165
 7:40 pm on Mar 22, 2008 (gmt 0)

I agree...

But... for a quick workaround, you could try OpenOffice

I just ran a very quick experiment using OOo writer to save (as HTML) an old and rather complicated, 18-page word.doc file that had a variety of headers, regular paragraphs, tables nested within tables, bulleted lists and images

It failed the w3c validator (as HTML 4.0 Transitional) for only FOUR reasons:

  1. The STYLE tag was missing type="text/css"
  2. All IMGs were missing alt attributes (However, as OOo assigned a NAME="theFileName" attribute to each image, this was v easy to fix manually)
  3. All ULs tags nested within other ULs weren't preceded by an LI tag
  4. All TABLEs were declared with an (invalid) attribute of "BORDERCOLOR"

Not bad...

Not perfect either... so... I still agree with omegaman66 above

g1smd




msg:3608190
 8:53 pm on Mar 22, 2008 (gmt 0)

If you do convert such documents (especially if you use any Microsoft product to do it) it is imperitive that you clean up the markup.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Content, Writing and Copyright
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved