homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

Upload doc/txt files and view as html
Upload a doc or txt file and strip out unwanted characters and view as html

 9:51 am on Oct 25, 2007 (gmt 0)

Hi all - I am new to posting on Webmaster World although I have been browsing the forums for years!

I am hoping someone can help me.

This is the scenario:
I use a form with multipart/form-data to upload text (probably Word but do not want to restrict to just .doc) documents - and add data to the database.
The word file is saved to the database in a field type Blob - but when I view the content it has a load of extra characters in the header and footer and does not keep the format.

I am presuming this has something to do with charsets (but not sure where to start if so) but I am not sure if there is anything I can do on the upload to strip the extraneous characters out.

I have read many things regarding converting Word to html and pretty much all of it involves going into Word and saving as html or importing a Word doc into Dreamweaver. These are not options due to the nature of the website. The file/character manipulation MUST be done on the fly.

Any ideas would be gratefully received!



 1:50 pm on Oct 25, 2007 (gmt 0)

welcome to WebmasterWorld, themistral and thanks for delurking!

what http headers are you sending with the document?
you may need to send a Content-Type:application/msword header.
otherwise i would look into character encoding and/or character set-related issues.


 2:16 pm on Oct 25, 2007 (gmt 0)

Thanks for the welcome phranque!

I have tried using the header Content-Type:application/msword header but it just sets the link as a Word download rather than showing the text.

I have played about with Charsets but to be honest I'm not too sure what I'm doing. I am sure someone out there must have tried to do this themselves...I hope!


 5:42 pm on Oct 25, 2007 (gmt 0)

there is no plugin available for viewing word files within a browser.
therefore if you are sending that content type, it must necessarily be downloaded for viewing purposes in ms word.
you can try to see if there is a way to convert the word document to some usable text on upload to the server or while serving the document to the browser.
your solution will depend on your server environment.


 12:26 pm on Oct 26, 2007 (gmt 0)

I have it so that on upload the content of the file and not just the filepath is saved to the database.
So technically, it is no longer a Word file as it is just content in a database field.

However, Word being Microsoft, means that there are lots of extra characters added to the content - they are what I need to get rid of!

Any ideas?


 12:48 pm on Oct 26, 2007 (gmt 0)

when you send a word document through the web, it isn't a file any more - it's just a stream of data.
it doesn't matter if the source of that data stream is a file on your server or a record in your db - it's the same data to the browser.
you need to figure out how to extract useful text from the word doc and store that instead.
as i mention before, your solution will depend on your server environment - and i have no clue there...


 2:34 pm on Oct 26, 2007 (gmt 0)

No worries phranque - your info is helpful!
Thanks a lot!

Global Options:
 top home search open messages active posts  

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved