Forum Moderators: coopster

Message Too Old, No Replies

converting doc to txt

huge number of files to be converted

         

Jaunty Edward

6:20 pm on Jan 13, 2007 (gmt 0)

10+ Year Member



Hi,

I think I tried all combinations of keywords on google but could not find a solution to this task. I have to store content of doc files into mysql table and for that I need to write a function in php that can read a file and give the text.

can any one help me out with any usefull information, what I know so far is I will require a com object, I tried one from phpclasses but it does not solve the problem as it acts like a macro that is opening word and saving the file as xml(not even txt). My client will convert over 1000 files everyday.

I will be thankful if anyone can put some light on the solution.

thanks
bye

henry0

8:02 pm on Jan 13, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Do a search as
Convert DOC and RTF to TXT
grab he first line.. Done
the rest you can figure

but if we are speaking of 1000s/day
you will really need think about a good logic and process to speed it up

<edit>
Not a free one but very affordable
Don't know anything about it
never used it
</edit>

mcibor

8:42 am on Jan 16, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here you can find some classes to help you with that problem:

[phpclasses.org...]

[phpclasses.org...]

PS. Henry, the search you suggested gave only programs that can convert that, but are not php related.

To search for php related try:

php+parse+word+files [google.com]

Michal

Jaunty Edward

9:30 pm on Jan 16, 2007 (gmt 0)

10+ Year Member



HI,

thank you both of you, I forgot to tell you that I will prefer to use a linux server as the entire application is in php and mysql but if I dont get any possible way out of this then I might consider windows.

This means, the classes that are using COM component are out. Actually I did try some of the classes, but the problem with them is they act like a macro that opens word and save the file in txt format. Which in my opinion will take a hell lot of processing power. And will surely require word to be installed on the server.

I did some more research and found that a small plugin software called Antiword, if installed on the server will let us convert doc files to txt from command prompt.

I am not very happy with that because I dont know how will I initiate a command using PHP. I think i will have to use system_exec() (not sure).

I am still looking out for a better solution, I am surprised as I feel this should be a common need for a lot of applications but I have not been able to find anything on the net.

Now I know why a lot of people hate Microsoft.

Thanks again I hope someone can help.

Thanks
Bye