Forum Moderators: open

Message Too Old, No Replies

Word doc to HTML quickly!

Change the Word doc to web page in a jiffy

         

infinitewoman

1:58 pm on Oct 14, 2003 (gmt 0)

10+ Year Member



I have 18 pages of Word documents that I need to upload to a website. I do not use software (like Dreamweaver). I prefer to code by hand.

What is the quickest way to transform Word documents into HTML thus creating an uploadable page? I could just code everything by hand but if a quicker way exists out there, please let me know.

Alternative Future

2:04 pm on Oct 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi,

File Save As! If you save the file as .html (select .htm .html) from the drop down list in Word. Then you should have all the html pages you require.

hth,

-gs

bcolflesh

2:14 pm on Oct 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The MS Office HTML Filter 2.0 is also a handy tool:

office.microsoft.com/downloads/2000/Msohtmf2.aspx

limbo

2:20 pm on Oct 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Alternative future is right but as you would expect the HTML files out put by word are really nasty!

Dreamweaver has a 'clean up word' command - isn't that good of them ;)

Apart from that and maybe PDF'ing the docs there is nothing more you can do really.

Copy and pasting large chunks of text reduces code bloat for the Save As HTML feature but for larger documents it is a bit time consuming.

infinitewoman

2:41 pm on Oct 14, 2003 (gmt 0)

10+ Year Member



Thanks for the input all! I have tried the "Save As" in Word but, as another kind soul mentioned, the code is bulky and tends to get messed up when you do this. If I had to go back and manually change anything, I think I'd pull out my hair!

Could you please explain "pdf"ing?

Alternative Future

2:45 pm on Oct 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



To create a PDF file for Adobe Acrobat Reader to display (which is built into some browsers now) you would require the likes of Adobe Illustrator to open the word document then again File Save As to create it into a PDF file. Googlebot picks up on PDF files now so most visitors would be able to find it.
There might be some other packages to this, am sure some other people on here will be quick to point this out ;)

And limbo came up with a good suggestion on the clean-up tool, this should help reduce the bloated code!

-gs

Mohamed_E

4:45 pm on Oct 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A belated welcome to WebmasterWorld, infinitewoman!

Though I have not had occasion to use it, I know that HTML-Tidy has a mode specifically for cleaning up the garbage that Word puts out as "HTML".

SethCall

5:07 pm on Oct 14, 2003 (gmt 0)

10+ Year Member



Infinite: are you coding actual HTML as your post suggests?

As in,

<html>
<body>
</body>
</html>

.

If so, save as "plain text"(.txt). That will remove all fluff and cruft that Word throws in there.

Actually, for html editing, I would then suggest you use another application, as Word isn't really ideal. There are many good free text editors out there. HapEdit I use for html, and topStyle lite for css.

limbo

5:14 pm on Oct 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sethcall

thats so simple :)

Just tried it on a 53 page word doc with all the 'fluff'and it actually broke it down into paragraphs for me. CSS will make short work of it.

infinitewoman

5:52 pm on Oct 14, 2003 (gmt 0)

10+ Year Member



SethCall...was it the "plain text" that worked so nicely for you?

I can't use another editor because the info is forwarded to me from a source that uses Word. I'd feel pushy if I insisted they use something else. :o)

Thank you for your welcome, Mohamed_E. webmasterworld comes highly recommended from a lot of my fellow webmistresses and they certainly were right about this forum!

infinitewoman

5:56 pm on Oct 14, 2003 (gmt 0)

10+ Year Member



Sethcall...

Yes, I am using pure HTML as you suggest.

The docs come to me as plain text already...straight Word docs. If I'm not mistaken, wouldn't saving as .txt be almost the same form as an original Word doc? Most of the docs I save in Notepad are saved as .txt

limbo

5:59 pm on Oct 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Infinite Woman

Yep that's the point I think - Having in it's raw paragraphed format means adding HTML to the contnt would be easy.

Any text editor will recognise .txt files so you will have no problem adding tags as you see fit.

bcolflesh

6:17 pm on Oct 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Another interesting tool:

stevemiller.net/puretext/

infinitewoman

6:44 pm on Oct 14, 2003 (gmt 0)

10+ Year Member



Well, loads of options here! I know it will work out well now. You're all super! Thanks for all the help.

SethCall

6:47 pm on Oct 14, 2003 (gmt 0)

10+ Year Member



Ok confusing setting in:

I really can't understand what's going so let me say this, and see what you got:

Definition:
.txt means the file name *ends* in .txt
.doc means the file name ends in .doc

in otherwords, im talking about a file extension.

if you can't see the file extension, in windows, open up a folder. Go to Tools -> Folder Options -> View -> Unclick " Hide File Extensions for Known File Types"

.txt file = any editor will open it. And, its pure text. No formatting information. This is what's needed for a html page.

.doc file = MS Word only doc. It has extra stuff in it that will break a web page. In fact, if you actually look at whats in a .doc file, there is no way it would ever work on the web.

So, if you are receiving a .doc file, you will have to open MS Word, and do the Save As .txt.

If you are receiving a .txt file, and you simply prefer to edit the html in MS Word, then you shouldn't have to do anything. When you save, MS WOrd will prompt you that all formatting will be removed (which is good).

Finally, before you upload the file to your webserver, you should change the file extension of your web page to .html, .php, .html, whatever.

If you are *working* with .html files, I would seriously suggest not using Ms Word. WHen you open a .html file in Internet Explorer and then SAVE it, it then saves the file in the "Ms Word-ugly web page format". This is because when MS Word opens up a .html file, is sees the .html on the end of the file and decides it will act as a HTML editor. So the solution is to keep your html files as ending in .txt, if you REALLY want to use MS word, or play russian roulette with your file, and always remember to do a "save as" plain text. If you forget even once, and do a regular save, and close word, your file will be ruined for good.

Do a search on this site, you will find links to good html text editors, that won't cause you these problems.

sagerock

6:58 pm on Oct 14, 2003 (gmt 0)

10+ Year Member



I was having this same trouble. I tried all of the filters and add ons and none of them were great.
Then I tried copying it into WordPad. It cleans up the code great. And from there, I copied it into FrontPage. I'm not sure if that is the editor you are using but it did the trick on my end.
Best of luck!