Forum Moderators: open

Message Too Old, No Replies

Converting messy word doc to html table

         

SilverLining

11:41 am on Nov 17, 2006 (gmt 0)

10+ Year Member



Is there a quick and easy way to convert a word document into html format?

I have a price list which needs to be in table format (at most three columns), but the problem is that the word document is so messy. The data is separated by tabs and spaces, so "converting text to table" does not really help much.

I also tried saving the word doc as an html page, but that adds too much superfluous code.

There must be more time consuming way than doing it manually. Thinking there could be a way of sorting this with XVI32 Hex Editor, but have not had much exposure with that.

Any suggestions?

Robin_reala

12:01 pm on Nov 17, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If it's tab separated I'd probably just copy the data into your text editor of choice and use find and replace. I use Textpad so I'd do:

Find:

\t

Replace:
</td><td>

Then to finish off the rows:

Find:

\n

Replace:
</td></tr>\n<tr><td>

Obviously you'll have to set up headers, summary, etc.

SilverLining

12:37 pm on Nov 17, 2006 (gmt 0)

10+ Year Member



Thanks Robin.

I'm using Notepadd ++ as GoLive does not seem to support this functionality. I have replaced all the tabs, but no new lines are found when searching for \n (Regular expressions ticked). What next?

[edited by: SilverLining at 12:42 pm (utc) on Nov. 17, 2006]

Robin_reala

12:51 pm on Nov 17, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Not sure for Notepad++ - I've used it but not to the extent of regex replacements.

There's a FAQ on their site about new-line replacements; it suggests Ctrl+M? Probably worth a read.

SilverLining

3:07 pm on Nov 17, 2006 (gmt 0)

10+ Year Member



Have looked at the FAQs, but in the end I used TextPad. Still need to clean up quite a bit, but thanks for the info.

johnnie

11:46 pm on Nov 17, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Save it as HTML, grab a pond of coffee and try to fix up all the horrible junk generated by word ;)

bill

11:52 pm on Nov 17, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



HTML Tidy has settings to clean up Word's HTML. FrontPage has a similar tool to optimize Word's HTML.

floriniri

6:16 am on Nov 18, 2006 (gmt 0)

10+ Year Member



I know most of the guys are using Dreanweaver. personally, I use FrontPage. In your case, I'd simply save as html from within Word, then use FrontPage to get rid of the aditional code that Word is putting in.
As a note: Dreamweaver is also adding useless code, so you end up having a bigger page.
Hope it helps

SilverLining

4:21 pm on Nov 27, 2006 (gmt 0)

10+ Year Member



Thanks for the suggestions.

I have installed the HTML Tidy eclipse plugin which looks pretty helpful, although our config.xml file alone has thousand of warnings/errors and those errors need to be fixed before using HTML Tidy to generate a tidied up version. I mostly use eclipse for jsps and xml files, so this is not ideal for testing standalone html files. Or is there a better way to use this plugin to it's full potential (on a windows box)? Do I need to setup a new project for my html pages?