Forum Moderators: open

Message Too Old, No Replies

Simplifying Word HTML

Making clean code in Word

         

MisterKen

1:32 am on Apr 6, 2004 (gmt 0)

10+ Year Member



Howdy,
We have a client who does not want to copy & paste their Word documents into Contribute.

...Ok...fine...

Is there a way to force Word to provide stripped HTML code (h1, h2, p, etc...)? I went to their site and had them download an extention that supposedly cleans it up. It did make it 'cleaner' but it is still messy.

Mohamed_E

2:41 am on Apr 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



HTML-Tidy has a command which cleans up Word generated HTML, but it is still messy.

How about saving as text and adding markup?

pshea

3:02 am on Apr 6, 2004 (gmt 0)

10+ Year Member



I would pretty much say fuggedabudit to asking MS to clean up their own code. There are a few third-party programs out there which attempt to do this; a couple of years ago, I even attempted to do a large research report for this website to answer this question. I found one which seemed promising, I believe it was something called "Word to Web" [or close to it], but they would not release their full program for demo, I grew frustrated with the restrictions of the demo and blew the project off.

Times change quickly in cyberspace and it would not surprise me that full releases of such programs are available today.

If it were my job assignment and important to determine an answer to this question, I would print out a list of the Fortune 100, get on the phone, call each company, ask to speak to the webmaster and ask them the question.

My thoughts on your having success in reverting to the phone is (a) who calls the webmaster for XXX big.corp and asks them a question they can relate to? It's is a low-percentage call with a high-percentage chance for response (b) what are the chances they have solved this problem with dogged creativity for their intranet [high, I think], and finally (c) if you ask, they will spill their beans because you will be the first one to have expressed any interest at all in the details.

Interest in your experience with this question will be high, best of luck and good wishes.

-pshea

MisterKen

3:47 pm on Apr 6, 2004 (gmt 0)

10+ Year Member



Well..I thought pshea might be right and I did a bit more digging. I came up with WordCleaner [wordcleaner.com].

I downloaded the trial version and I think, with a bit work on both client and our side, it will strip out the 'garbage'.

If the client can discipline themselves to use H1, H2 tags in their file, WordCleaner takes out the font and span tags.

It's a pretty good tool from what I can see. There is also a level of customization that you can create as well.

All that said, it's still a shame that Word does not give you an option to output simple code. What a waste of time and effort.

Thank you both for your help!
Cheers,
ken

pageoneresults

4:03 pm on Apr 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



All that said, it's still a shame that Word does not give you an option to output simple code.

Word is not an HTML editor, it is a Word Processing program. The only way, and I do mean the only way to strip out anything from Word is using NotePad or another text editor. Even then, there is still a visual inspection required to verify that all MS code has been removed. In most instances, you will end up manually removing the ghosts, usually mso stuff.

P.S. I have a secret weapon for Word files... FrontPage! ;)

ricfink

3:13 pm on Apr 10, 2004 (gmt 0)

10+ Year Member



I've gotten acceptable results by using the HTML(Filtered) SaveAs option in Word 2003.

Hey, the markup isn't elegant, but it works and, as far as the pages I've used it on, it's cross-browser.

Looks fine in OPera and Moz.

Bondi

6:07 pm on Apr 10, 2004 (gmt 0)

10+ Year Member



I use HTML Transit [avantstar.com] to convert DOC to HTML. It can produce a fairly clean code. However, it's useful mostly when one has a bunch of similar DOC's, i. e. based on a template, as HTML Transit also uses templates for the conversion. Then you can match a CSS class to a Word style etc. It can be a headache to create and match a real good template which gives you a decent HTML-code because there is a plethora of custom prefs, but when you're figured 'em out, it is a real help. :)