Forum Moderators: coopster

Message Too Old, No Replies

Reading the contents of a word document

         

babil

4:54 pm on Feb 24, 2005 (gmt 0)

10+ Year Member



Hi all,
this is my first post here. I have the following problem. I want to read the contents of a word document with pictures and formatted text with a php script and then display it to the browser window without losing any information. I tried it with fopen , fread etc but the result was not what I expected.
The question: How can I read a formatted text (even a html or PDF but not plain text) and display it exactly as it is in the browser?
Is it possible or am I asking too much?

My PHP knowledge is limited.

Thanks in advance

babil

Nutter

5:56 pm on Feb 24, 2005 (gmt 0)

10+ Year Member



I don't think this comes from a lack of PHP knowledge, I think it comes from not knowing the format of a Word document. They are not simply text. From the very limited look I did into the MS Doc format, it can get fairly complex. Pulling the text alone may prove to be fairly simple, but the formatting can be difficult.

If you're bound and determined, make sure you're opening the file in a binary safe method. Look at [wotsit.org...] for some information on file formats.

If these are your files, you might try RTF. You can save in RTF from Word, and they're somewhat similar to HTML (although you would have to do some conversion).

- Ryan

vabtz

6:01 pm on Feb 24, 2005 (gmt 0)



I thought word was using XML now and there were XLT stylesheets out there already for that

babil

6:16 pm on Feb 24, 2005 (gmt 0)

10+ Year Member



thanks everyone, there seems to be no easy solution to this. Is there any other way to do the same thing, to have the contents of a specific section of the site in an external file ( formatted text and pics) and pass it to the browser? I don't have any other idea after spending the last two days in front of my monitor.

(I think it can be done with mySQL, but no time to learn how to use it)

cheers
babil

Nutter

6:29 pm on Feb 24, 2005 (gmt 0)

10+ Year Member



You're probably right about the XML. I'm still using Word 2k. Haven't found a good reason to upgrade :-)

I'm on the wrong computer to check, but doesn't Word have the option to save as HTML? Would that work for you? What about PDF files? How 'bout copy and paste from Word to Front Page?

babil

6:54 pm on Feb 24, 2005 (gmt 0)

10+ Year Member



ok let me explain. I had a word doc with text and pics. I saved it as html via word. Thts the code for opening and displaying it.
When i display it , the html formatting tags are also visble.
How can I get rid of them?

babil

<?
$file = 'TAE/tae21.htm';
$data = file($file) or die('Could not read file!');
foreach ($data as $line) {
echo nl2br($line);

};

?>

vabtz

7:04 pm on Feb 24, 2005 (gmt 0)



first thing I would do is make sure your sending the right mime type

try putting this in the first line of that code and see if it fixes it.

header('Content-Type: text/html');

babil

7:07 pm on Feb 24, 2005 (gmt 0)

10+ Year Member



nope, it doesn't work

thanks

<?
header('Content-Type: text/html');
$file = 'TAE/tae21.htm';
$data = file($file) or die('Could not read file!');
foreach ($data as $line) {
echo nl2br($line);

};

?>

vabtz

7:11 pm on Feb 24, 2005 (gmt 0)



sticky me a url and I'll look at it if you like

Nutter

9:17 pm on Feb 24, 2005 (gmt 0)

10+ Year Member



Have you looked at the raw HTML for the page? Maybe Word puts something odd in there. Can you open that file directly in your browser and have it work correctly?

vabtz

9:41 pm on Feb 24, 2005 (gmt 0)



make sure your not seeing a cached copy in your browser too

babil

9:48 pm on Feb 24, 2005 (gmt 0)

10+ Year Member



I did open the file with notepad. Nothing weird there,
except that it is full of style tags (due to the formatting)

babil

Nutter

10:02 pm on Feb 24, 2005 (gmt 0)

10+ Year Member



What's it showing when you run your script?

babil

10:21 pm on Feb 24, 2005 (gmt 0)

10+ Year Member



here it is:

xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="urn:schemas-microsoft-com:office:word"
xmlns="http://www.w3.org/TR/REC-html40">

v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}

&#353;&#8250; &#8212;&#161;&#732;&#164;&#8226;
&#164;&#8212; &#164;&#8226; Virtual

bhma

2
17
2005-02-04T17:09:00Z
2005-02-24T18:41:00Z
2005-02-24T18:41:00Z
1
572
3265
27
7
3830
10.2625

Clean
Clean

MicrosoftInternetExplorer4

/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:"Times New Roman";}

&#353;&#8250;
&#8212;&#161;&#732;&#164;&#8226; &#164;&#8212;
&#164;&#8226; style='font-size:9.0pt;color:black;mso-ansi-language:EN-US;font-weight:normal;
mso-bidi-font-weight:bold'>Virtual.

&#192;&#196;&#191;&#175;
&#198;&#175;&#191;&#185; &#196;&#194;
&#8226;&#185;&#186;&#174;&#194;
&#8226;&#185;&#186;&#191;&#185;&#186;&#174;&#194;
&#186;&#185; (&#185;&#196;&#185; &#204;&#199;&#185; ;)
&#196;&#194; real style='font-size:9.0pt;font-family:Arial;mso-bidi-font-family:"Times New Roman";
color:black'> style='mso-spacerun:yes'> &#181;&#193;&#191;&#192;&#191;&#193;&#175;&#194; style='mso-spacerun:yes'> &#188;&#185;
&#192;&#193;&#188;&#196;&#185;&#186;&#204;&#196;&#196;
&#192;&#191;&#197; &#184; &#181;&#185;
&#196;
&#199;&#193;&#191;&#191;&#188;&#199;&#174;
&#188;&#192;&#193;&#191;&#195;&#196; &#185;
&#188;&#194; &#198;&#193;&#181;&#185; &#204;&#188;&#201;&#194;
&#192;&#175;&#195;&#201; &#195;&#196;&#191; &#199;&#193;&#204;&#191;
&#181;&#175;&#185; &#181;&#191;&#204;&#194;.

&#193;&#204;&#186;&#181;&#185;&#196;&#185;
&#185; &#196;
&#181;&#190;&#191;&#188;&#191;&#175;&#201;&#195; &#196;&#194;
&#164;&#8226; (&#164;&#181;&#199;&#185;&#186;&#175;
&#186;&#185; &#181;&#193;&#191;&#192;&#191;&#193;&#185;&#186;&#175;
&#8226;&#186;&#188;&#181;&#196;&#181;&#205;&#195;&#181;&#185;&#194;
&#376;.&#8226; - style='mso-spacerun:yes'> Technical and Aeronautical Exploitations Co.
Ltd.). style='mso-spacerun:yes'> &#196;
&#192;&#193;&#206;&#196; ,
&#191;&#197;&#195;&#185;&#195;&#196;&#185;&#186;,
&#185;&#180;&#185;&#201;&#196;&#185;&#186;&#174; style='mso-spacerun:yes'> &#192;&#191;&#185;&#196;&#185;&#186;&#174;
&#181;&#193;&#191;&#192;&#191;&#193;&#185;&#186;&#174;
&#181;&#196;&#185;&#193;&#181;&#175; &#192;&#191;&#197;
&#180;&#193;&#195;&#196;&#193;&#185;&#191;&#192;&#191;&#185;&#174;&#184;&#186;&#181;
&#195;&#196; &#8226;&#180;
&#192;&#204; &#196;&#191; 1935 &#201;&#194; &#196;&#185;&#194; 6
&#192;&#193;&#185;&#175;&#191;&#197; &#196;&#191;&#197; 1957,
&#204;&#196; &#185;&#180;&#193;&#205;&#184;&#186;&#181;
&#376;&#197;&#188;&#192;&#185;&#186;&#174;
&#181;&#193;&#191;&#192;&#191;&#193;&#175;.

style='mso-spacerun:yes'> &#339;&#185;&#195;&#196;,
&#204;&#192;&#201;&#194;
&#192;&#181;&#193;&#185;&#193;&#200;&#181; style='mso-spacerun:yes'> &#191; &#186;&#197;&#181;&#193;&#174;&#196;&#194;
&#196;&#191;&#197; DC lang=EL style='font-size:9.0pt;font-family:Arial;mso-bidi-font-family:"Times New Roman";
color:black'>-3 &#205;&#191;&#194;
&#8482;&#201;&#175;&#180;&#194; &#185;
&#196; &#188;&#193;

and it goes on and on.....
The fonts don't display correctly due to the encoding (greek)

It's pretty frustrating the whole thing.

babil

gettopreacherman

10:35 pm on Feb 24, 2005 (gmt 0)

10+ Year Member



Create a new file, if you have a header and footer it would look something like this:

<?php

include("header.php");

include("page.html");

include("footer.php");

?>

that will output your page using php.

babil

10:57 pm on Feb 24, 2005 (gmt 0)

10+ Year Member



This is the page layout:

INDEX.PHP
----------
<?php
require_once("class.php");

if(!isset($fpage))
{
$fpage = "news.php";
}
else {
$fpage ;
}

$page = new Page("template.html");

$page->replace_tags(array(
"title"=>"Hoav Website",

"contenthd"=>"Anouncements",
"content" => $fpage,
"navbar" => "board2.htm",

));

$page->output();
?>

END INDEX.PHP
-------------

BEGIN CLASS.PHP
---------------
<?php
class Page
{
var $page;

function Page($template = "template.html") {
if (file_exists($template))
$this->page = join("", file($template));
else
die("Template file $template not found.");
}

function parse($file) {
ob_start();
include($file);
$buffer = ob_get_contents();
ob_end_clean();
return $buffer;
}

function replace_tags($tags = array()) {
if (sizeof($tags) > 0)
foreach ($tags as $tag => $data) {
$data = (file_exists($data))? $this->parse($data) : $data;
$this->page = eregi_replace("{" .$tag. "}", $data,
$this->page);
}
else
die("No tags designated for replacement.");
}

function output() {
echo $this->page;
}
}
?>

END CLASS.PHP
-------------

BEGIN TEST.CSS
--------------

#body {margin:0px;
background-color: #336699;
overflow:auto;

}

#main {
width:100%;
vertical-align: top;
height:100%;
border: none;
overflow:scroll;

}
#head {
background-image:url('head.jpg') ;
background-repeat:repeat-x;
height:60px;
position: relative;

}
#navbar {
width: 20%;
background-color:#336699 ;
vertical-align:top;
border:solid 2px #336699;

}

#content {
width: 80%;

background-color:#003366;
vertical-align:top;
border:solid 5px #000000;
text-align: center;
color:#FFFFCC;
font-size: 35px;
font-family: serif;
overflow: scroll;
font-weight: bold;
}
#p {
color:#ffff99;
font-size: 15px;
text-align: left;
font-family: serif;
font-weight: lighter;
padding-left: 5%;
}

END TEST.CSS
-------------

BEGIN TEMPLATE.HTML
-------------------

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<html>
<head>
<title>{title}</title>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1" />
<link rel="stylesheet" type="text/css" href="test.css" />
</head>
<body id="body">
<table id="main" cellpadding="0" cellspacing="0" >
<tr>
<td id="head" colspan="2">{head}</td>
</tr>
<tr>
<td id="navbar">{navbar}

</td>
<td id="content">{contenthd} <p id="p">

{content}
</p>

</td>
</tr>
</table>

</body>
</html>

END TEMPLATE.HTML
-----------------

BEGIN NEWS.PHP
--------------

<?

$file = 'news.txt';
$data = file($file) or die('Could not read file!');
foreach ($data as $line) {
echo nl2br($line);
};

?>

END NEWS.PHP
------------

BEGIN TAE.PHP
--------------

<?
header('Content-Type: text/html');
$file = 'TAE/tae2.htm';
$data = file($file) or die('Could not read file!');
foreach ($data as $line) {
echo nl2br($line);

};

?>
END TAE.PHP
-----------

Sorry for the long post, but now you can have an overview of my project. My aim is to make a website which will be easily updateable. No need to write html code, placing images etc. Just create a word, pdf etc document with the updated content the plus images and then upload it on the server.

cheers
babil

hiker_jjw

12:50 am on Feb 25, 2005 (gmt 0)



Don't make it so hard ya'll.

Just get a copy of FCK Editor install on the Server with a simple CMS of some kind. The editor will allow you to copy and paste from Word to the Editor HTML with "cleaning". It works pretty good.

My 2 cents.