homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Local / Foo
Forum Library, Charter, Moderators: incrediBILL & lawman

Foo Forum

Word html documents
oh my god...

 4:36 pm on Feb 21, 2003 (gmt 0)

Just for a laugh I decided to see what would happen if you tried to create a html document using MS word.

What follows is the unedited source of a blank html document:

<html xmlns:o="urn:schemas-microsoft-com:office:office"

<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<meta name=ProgId content=Word.Document>
<meta name=Generator content="Microsoft Word 9">
<meta name=Originator content="Microsoft Word 9">
<link rel=File-List href="./Document2_files/filelist.xml">
<!--[if gte mso 9]><xml>
<o:Author>My name was here</o:Author>
</xml><![endif]--><!--[if gte mso 9]><xml>
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
font-family:"Times New Roman";
mso-fareast-font-family:"Times New Roman";
@page Section1
{size:8.5in 11.0in;
margin:1.0in 1.25in 1.0in 1.25in;

<body lang=EN-GB style='tab-interval:.5in'>

<div class=Section1>

<p class=MsoNormal><span lang=EN-US><![if!supportEmptyParas]>&nbsp;<![endif]><o:p></o:p></span></p>




Would it even work? Or am I being silly, and this is all valid stuff?



 4:39 pm on Feb 21, 2003 (gmt 0)

It works in IE atleast. Scary isn't it. Word loves xml data islands. You want somemore scary stuff. Export powerpoint to html. Yikes!


 4:56 pm on Feb 21, 2003 (gmt 0)

Yes that looks pretty bad. Try putting it in Dreamweaver and use the 'clear up Word HTML' command!


 4:58 pm on Feb 21, 2003 (gmt 0)

I know someone that almost went to a mental institution trying to clean up word html for a client's site they had done themselves.

It is amazing all the extra crap they put in there.


 5:02 pm on Feb 21, 2003 (gmt 0)

"Dreamweaver and use the 'clear up Word HTML' command!"

This was an experiment. I usually use Editplus2.

Just so any of you don't completely misjudge me here... ;)


 5:04 pm on Feb 21, 2003 (gmt 0)

I know of a non-profit organisation with a web presence fueled by MS Word.

Instead of creating a new page for a lengthy article, they just stack it on top of others, blog-style. After 4 years, the index.html page now prints to about 20 pages and is about 600k.


 6:29 pm on Feb 21, 2003 (gmt 0)

Please don't make fun of MS Word's HTML capabilities. I must have created and uploaded atleast two websites using MS Word. Brings back lots of wonderful memories.

Before that, I was using Netscape Composer. :)

Anyway, now I have made an upgrade to.. ahem.. MS Frontpage 2002. Works like charm. ;)


 6:34 pm on Feb 21, 2003 (gmt 0)

I love it when my ad clients use these "create html" quasi-utilities. It means they'll be buying traffic from me for a looooooong time.


 7:26 pm on Feb 21, 2003 (gmt 0)

Just yesterday I had to clean up a file for a new client that had been made in Word - it was horrible! Just like the code you posted, only longer! Thankfully it was only one page! It doesn't surprise me that someone could end up in a mental institution from this.

What I see as a challenge though is how do you get through to a client that what they have is garbage? It all looks the same to them when they look at it in IE. So how do you explain to them that either you will have to start all over making their site, or have to edit the garbage that Word wrote, and that either way it will cost them some money. They just don't seem to get it, like: 'the site is basically there already, it should be real easy for you to just change this or that. Their friend who made the site the first time could easily add whatever, just by opening it in word...etc.'.

(I need a better way to make money!)


 4:31 am on Feb 22, 2003 (gmt 0)

"how do you get through to a client that what they have is garbage?..."

LOL! That is SO true.

A while back a small non-profit org asked me to "help" with their website.
It had been done in a combination of FP98, FP200, FTP, Word, and a REALLY old copy of Fusion, and probably some other stuff and text editors that I did not look for. All mixed together. All done by a succession of people that obviously had no concept of what they were doing.

When I told them what needed to be done (such as removing or scaling down the 560kb graphics background) I started getting the "well, we want to keep that, and this, and don't change that.."
I finally told them it was hopeless given the limitations they gave me (and the obvious internal bickering that was going on).

All this for a site that got maybe 500 hits a month ;P


 4:53 am on Feb 22, 2003 (gmt 0)

oohhhh, so that's what that stuff is.......

A buddy of mine put a little site online ad it was full of code like that, cept' the word idenifiers were removed. He was a PhD Enginering grad so I thought it was some obscure coding/formatting thing used in Engineering......and it was Word all along........


 6:12 pm on Feb 22, 2003 (gmt 0)

Just as a matter of comparison:

OpenOffice.org blank page

<META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=windows-1252">
<META NAME="GENERATOR" CONTENT="OpenOffice.org 1.0.1 (Win32)">
<META NAME="CREATED" CONTENT="20030222;18070329">

Mozilla Composer -

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<meta http-equiv="content-type"
content="text/html; charset=ISO-8859-1">

Dreamweaver MX 'Basic webpage' -

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<title>Untitled Document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">



They all appear to do quite a good job of it. Personaly, I can't stand using M$ Office at all anymore, long live Open Office!

brotherhood of LAN

 6:15 pm on Feb 22, 2003 (gmt 0)

hmmm, i think this is why frontpage creates a little "extra" code here and there (not as much as word, for sure!).

All MS office programs seem to be interoperable..ie save htm as xls, xls as cvs etc etc.

I guess any form of "HTML tidy" would have a hard time ironing all all the code.


 6:23 pm on Feb 22, 2003 (gmt 0)

just don't use word anylonge for this. i'm just using staroffice (not open office, but quite the same), and the html is also packed with stylesheet stuff etc. . i don't like it either. maybe dreamweaver really is a solution for converting longer documents out of a wordprocessor.


 6:31 pm on Feb 22, 2003 (gmt 0)

The solution I use - copy all the text out of the page to a .txt file and start again.

It's so much easier.


 1:28 pm on Feb 26, 2003 (gmt 0)

Alternatively you could try Dean Allen's Word HTML cleaner which (when it works) is excellent!


[edited by: lawman at 12:45 am (utc) on Feb. 27, 2003]
[edit reason] delinked [/edit]

creative craig

 1:40 pm on Feb 26, 2003 (gmt 0)

Before I started to working for my current employer the company intranet was a 600 page word hell :(

Layout was good but the code sucked.. its all good now though :)



 3:07 pm on Mar 1, 2003 (gmt 0)

A non-Windows reply?

I copy-and-pasted the first mentioned HTML into a blank *"Simpletext" document and saved it as an HTML file on my Mac. I've done this before, including when I was creating my own webpage, and it workd quite well.

Then, once saved as an HTML file, I tried opening it with either Netscape 4.79 or iCab 2.82.

All I get is a blank white screen in both browsers. Sorry, the code doesn't work on us with iconoclastic, non-Windows platforms.

(*"Simpletext" is a modestly powered word processing program that comes with all Macs. It's pretty versatile and by default it only uses 512 k of RAM (you can tweak it to whatever you want of course). It handles different text styles, different fonts, search and replace, as well as (through linking with Quicktime) being able to play movies and sounds, even speaking typed text. But as far as handling plain text is concerned, it is akin to generating plain .TXT files on a Windows system with Notepad or some other program. The only thing it won't do is read Word .DOCs. For that I use another program called "Fileview" to rip the text out of ANY file, sans formatting. It's pretty well the only word processor I think of using nowadays.)

Global Options:
 top home search open messages active posts  

Home / Forums Index / Local / Foo
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved