Welcome to WebmasterWorld Guest from 54.146.201.80

Forum Moderators: incrediBILL & lawman

Message Too Old, No Replies

Word html documents

oh my god...

     
4:36 pm on Feb 21, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 3, 2002
posts:1590
votes: 0


Just for a laugh I decided to see what would happen if you tried to create a html document using MS word.

What follows is the unedited source of a blank html document:

<html xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="urn:schemas-microsoft-com:office:word"
xmlns="http://www.w3.org/TR/REC-html40">

<head>
<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<meta name=ProgId content=Word.Document>
<meta name=Generator content="Microsoft Word 9">
<meta name=Originator content="Microsoft Word 9">
<link rel=File-List href="./Document2_files/filelist.xml">
<!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Author>My name was here</o:Author>
<o:Template>Normal</o:Template>
<o:Revision>1</o:Revision>
<o:TotalTime>0</o:TotalTime>
<o:Created>2003-02-21T16:34:00Z</o:Created>
<o:Pages>1</o:Pages>
<o:Company></o:Company>
<o:Lines>1</o:Lines>
<o:Paragraphs>1</o:Paragraphs>
<o:Version>9.2720</o:Version>
</o:DocumentProperties>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
</w:WordDocument>
</xml><![endif]-->
<style>
<!--
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{mso-style-parent:"";
margin:0in;
margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:"Times New Roman";
mso-fareast-font-family:"Times New Roman";
mso-ansi-language:EN-US;}
@page Section1
{size:8.5in 11.0in;
margin:1.0in 1.25in 1.0in 1.25in;
mso-header-margin:.5in;
mso-footer-margin:.5in;
mso-paper-source:0;}
div.Section1
{page:Section1;}
-->
</style>
</head>

<body lang=EN-GB style='tab-interval:.5in'>

<div class=Section1>

<p class=MsoNormal><span lang=EN-US><![if!supportEmptyParas]>&nbsp;<![endif]><o:p></o:p></span></p>

</div>

</body>

</html>

Would it even work? Or am I being silly, and this is all valid stuff?

4:39 pm on Feb 21, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member korkus2000 is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 20, 2002
posts:3732
votes: 0


It works in IE atleast. Scary isn't it. Word loves xml data islands. You want somemore scary stuff. Export powerpoint to html. Yikes!
4:56 pm on Feb 21, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member sem4u is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Dec 18, 2002
posts:3061
votes: 0


Yes that looks pretty bad. Try putting it in Dreamweaver and use the 'clear up Word HTML' command!
4:58 pm on Feb 21, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 21, 2001
posts:2149
votes: 0


I know someone that almost went to a mental institution trying to clean up word html for a client's site they had done themselves.

It is amazing all the extra crap they put in there.

5:02 pm on Feb 21, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 3, 2002
posts:1590
votes: 0


"Dreamweaver and use the 'clear up Word HTML' command!"

This was an experiment. I usually use Editplus2.

Just so any of you don't completely misjudge me here... ;)

5:04 pm on Feb 21, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 16, 2002
posts:136
votes: 0


I know of a non-profit organisation with a web presence fueled by MS Word.

Instead of creating a new page for a lengthy article, they just stack it on top of others, blog-style. After 4 years, the index.html page now prints to about 20 pages and is about 600k.

6:29 pm on Feb 21, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 8, 2002
posts:1157
votes: 0


Please don't make fun of MS Word's HTML capabilities. I must have created and uploaded atleast two websites using MS Word. Brings back lots of wonderful memories.

Before that, I was using Netscape Composer. :)

Anyway, now I have made an upgrade to.. ahem.. MS Frontpage 2002. Works like charm. ;)

6:34 pm on Feb 21, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member rcjordan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 22, 2000
posts:9138
votes: 0


I love it when my ad clients use these "create html" quasi-utilities. It means they'll be buying traffic from me for a looooooong time.
7:26 pm on Feb 21, 2003 (gmt 0)

Preferred Member

10+ Year Member

joined:Feb 12, 2002
posts:565
votes: 0


Just yesterday I had to clean up a file for a new client that had been made in Word - it was horrible! Just like the code you posted, only longer! Thankfully it was only one page! It doesn't surprise me that someone could end up in a mental institution from this.

What I see as a challenge though is how do you get through to a client that what they have is garbage? It all looks the same to them when they look at it in IE. So how do you explain to them that either you will have to start all over making their site, or have to edit the garbage that Word wrote, and that either way it will cost them some money. They just don't seem to get it, like: 'the site is basically there already, it should be real easy for you to just change this or that. Their friend who made the site the first time could easily add whatever, just by opening it in word...etc.'.

(I need a better way to make money!)

4:31 am on Feb 22, 2003 (gmt 0)

Full Member

10+ Year Member

joined:May 8, 2002
posts:325
votes: 0


"how do you get through to a client that what they have is garbage?..."

LOL! That is SO true.

A while back a small non-profit org asked me to "help" with their website.
It had been done in a combination of FP98, FP200, FTP, Word, and a REALLY old copy of Fusion, and probably some other stuff and text editors that I did not look for. All mixed together. All done by a succession of people that obviously had no concept of what they were doing.

When I told them what needed to be done (such as removing or scaling down the 560kb graphics background) I started getting the "well, we want to keep that, and this, and don't change that.."
I finally told them it was hopeless given the limitations they gave me (and the obvious internal bickering that was going on).

All this for a site that got maybe 500 hits a month ;P

4:53 am on Feb 22, 2003 (gmt 0)

Moderator

WebmasterWorld Administrator skibum is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Sept 20, 2000
posts:4469
votes: 1


oohhhh, so that's what that stuff is.......

A buddy of mine put a little site online ad it was full of code like that, cept' the word idenifiers were removed. He was a PhD Enginering grad so I thought it was some obscure coding/formatting thing used in Engineering......and it was Word all along........

6:12 pm on Feb 22, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:June 11, 2001
posts:134
votes: 0


Just as a matter of comparison:

OpenOffice.org blank page

==============================
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=windows-1252">
<TITLE></TITLE>
<META NAME="GENERATOR" CONTENT="OpenOffice.org 1.0.1 (Win32)">
<META NAME="CREATED" CONTENT="20030222;18070329">
<META NAME="CHANGED" CONTENT="16010101;0">
</HEAD>
<BODY LANG="en-US">
<P><BR><BR>
</P>
</BODY>
</HTML>
=================================

Mozilla Composer -

=================================
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="content-type"
content="text/html; charset=ISO-8859-1">
<title>Composer</title>
</head>
<body>
<br>
</body>
</html>
=================================

Dreamweaver MX 'Basic webpage' -

=================================
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Untitled Document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>

<body>

</body>
</html>
=================================

They all appear to do quite a good job of it. Personaly, I can't stand using M$ Office at all anymore, long live Open Office!

6:15 pm on Feb 22, 2003 (gmt 0)

Moderator from GB 

WebmasterWorld Administrator brotherhood_of_lan is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 30, 2002
posts:4842
votes: 1


hmmm, i think this is why frontpage creates a little "extra" code here and there (not as much as word, for sure!).

All MS office programs seem to be interoperable..ie save htm as xls, xls as cvs etc etc.

I guess any form of "HTML tidy" would have a hard time ironing all all the code.

6:23 pm on Feb 22, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Jan 7, 2003
posts:1230
votes: 0


just don't use word anylonge for this. i'm just using staroffice (not open office, but quite the same), and the html is also packed with stylesheet stuff etc. . i don't like it either. maybe dreamweaver really is a solution for converting longer documents out of a wordprocessor.
6:31 pm on Feb 22, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:June 11, 2001
posts:134
votes: 0


The solution I use - copy all the text out of the page to a .txt file and start again.

It's so much easier.

1:28 pm on Feb 26, 2003 (gmt 0)

New User

10+ Year Member

joined:Feb 26, 2003
posts:7
votes: 0


Alternatively you could try Dean Allen's Word HTML cleaner which (when it works) is excellent!

http://www.textism.com/resources/cleanwordhtml/

[edited by: lawman at 12:45 am (utc) on Feb. 27, 2003]
[edit reason] delinked [/edit]

1:40 pm on Feb 26, 2003 (gmt 0)

Senior Member from ZA 

WebmasterWorld Senior Member 10+ Year Member

joined:July 15, 2002
posts:1720
votes: 1


Before I started to working for my current employer the company intranet was a 600 page word hell :(

Layout was good but the code sucked.. its all good now though :)

Craig

3:07 pm on Mar 1, 2003 (gmt 0)

New User

10+ Year Member

joined:Aug 18, 2002
posts:37
votes: 0


A non-Windows reply?

I copy-and-pasted the first mentioned HTML into a blank *"Simpletext" document and saved it as an HTML file on my Mac. I've done this before, including when I was creating my own webpage, and it workd quite well.

Then, once saved as an HTML file, I tried opening it with either Netscape 4.79 or iCab 2.82.

All I get is a blank white screen in both browsers. Sorry, the code doesn't work on us with iconoclastic, non-Windows platforms.

(*"Simpletext" is a modestly powered word processing program that comes with all Macs. It's pretty versatile and by default it only uses 512 k of RAM (you can tweak it to whatever you want of course). It handles different text styles, different fonts, search and replace, as well as (through linking with Quicktime) being able to play movies and sounds, even speaking typed text. But as far as handling plain text is concerned, it is akin to generating plain .TXT files on a Windows system with Notepad or some other program. The only thing it won't do is read Word .DOCs. For that I use another program called "Fileview" to rip the text out of ANY file, sans formatting. It's pretty well the only word processor I think of using nowadays.)

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members