Forum Moderators: phranque

Message Too Old, No Replies

Word (.doc) -> .txt script

         

robster124

11:30 pm on Dec 3, 2004 (gmt 0)

10+ Year Member



Right, I may have the wrong forum for this but here goes.

A site I run receives .doc files which are uploaded the site. I need a script (or even just a .exe that I can locally run) that will convert word files into plain .txt files. Formatting is not an issue, and is not needed. The .txt files will be used as raw data for the site search engine and other purposes.

Im sure some of you are thinking why dont I just use word and do a bit of copy and pasting. Well, tonnes of .doc s are submitted every day and I have to make this as automated as possible.

Any suggestions greatly appreciated?

Preferably looking for something free... And open source...

kaled

1:19 am on Dec 4, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I may be wrong, but I believe the Word format has never been published. That said, import filters do exist so its structure is known.

Perhaps you should consider asking people to upload in rtf format. You can view rtf files in notepad (I think) so designing your own filter should be straightforward. Also, finding such a filter may be easier.

Maybe you should repost in forum13.
Maybe you should Google rtf perl.

Kaled.

TheDoctor

11:39 pm on Dec 4, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If the site runs under Windows (rather than Linux) you could probably write a VBA script, or a Visual Basic program, to do this.

Whether you could get something to run under Linux, I don't know.

vrtlw

11:49 pm on Dec 4, 2004 (gmt 0)

10+ Year Member



Good old w3.org

[w3.org...]

And the google search [google.com] I used to find this