Welcome to WebmasterWorld Guest from 54.159.190.106

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

Upload doc/txt files and view as html

Upload a doc or txt file and strip out unwanted characters and view as html

   
9:51 am on Oct 25, 2007 (gmt 0)

5+ Year Member



Hi all - I am new to posting on Webmaster World although I have been browsing the forums for years!

I am hoping someone can help me.

This is the scenario:
I use a form with multipart/form-data to upload text (probably Word but do not want to restrict to just .doc) documents - and add data to the database.
The word file is saved to the database in a field type Blob - but when I view the content it has a load of extra characters in the header and footer and does not keep the format.

I am presuming this has something to do with charsets (but not sure where to start if so) but I am not sure if there is anything I can do on the upload to strip the extraneous characters out.

I have read many things regarding converting Word to html and pretty much all of it involves going into Word and saving as html or importing a Word doc into Dreamweaver. These are not options due to the nature of the website. The file/character manipulation MUST be done on the fly.

Any ideas would be gratefully received!

1:50 pm on Oct 25, 2007 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld, themistral and thanks for delurking!

what http headers are you sending with the document?
you may need to send a Content-Type:application/msword header.
otherwise i would look into character encoding and/or character set-related issues.

2:16 pm on Oct 25, 2007 (gmt 0)

5+ Year Member



Thanks for the welcome phranque!

I have tried using the header Content-Type:application/msword header but it just sets the link as a Word download rather than showing the text.

I have played about with Charsets but to be honest I'm not too sure what I'm doing. I am sure someone out there must have tried to do this themselves...I hope!

5:42 pm on Oct 25, 2007 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



there is no plugin available for viewing word files within a browser.
therefore if you are sending that content type, it must necessarily be downloaded for viewing purposes in ms word.
you can try to see if there is a way to convert the word document to some usable text on upload to the server or while serving the document to the browser.
your solution will depend on your server environment.
12:26 pm on Oct 26, 2007 (gmt 0)

5+ Year Member



I have it so that on upload the content of the file and not just the filepath is saved to the database.
So technically, it is no longer a Word file as it is just content in a database field.

However, Word being Microsoft, means that there are lots of extra characters added to the content - they are what I need to get rid of!

Any ideas?

12:48 pm on Oct 26, 2007 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



when you send a word document through the web, it isn't a file any more - it's just a stream of data.
it doesn't matter if the source of that data stream is a file on your server or a record in your db - it's the same data to the browser.
you need to figure out how to extract useful text from the word doc and store that instead.
as i mention before, your solution will depend on your server environment - and i have no clue there...
2:34 pm on Oct 26, 2007 (gmt 0)

5+ Year Member



No worries phranque - your info is helpful!
Thanks a lot!
 

Featured Threads

Hot Threads This Week

Hot Threads This Month