It works in IE atleast. Scary isn't it. Word loves xml data islands. You want somemore scary stuff. Export powerpoint to html. Yikes!
Yes that looks pretty bad. Try putting it in Dreamweaver and use the 'clear up Word HTML' command!
I know someone that almost went to a mental institution trying to clean up word html for a client's site they had done themselves.
It is amazing all the extra crap they put in there.
"Dreamweaver and use the 'clear up Word HTML' command!"
This was an experiment. I usually use Editplus2.
Just so any of you don't completely misjudge me here... ;)
I know of a non-profit organisation with a web presence fueled by MS Word.
Instead of creating a new page for a lengthy article, they just stack it on top of others, blog-style. After 4 years, the index.html page now prints to about 20 pages and is about 600k.
Please don't make fun of MS Word's HTML capabilities. I must have created and uploaded atleast two websites using MS Word. Brings back lots of wonderful memories.
Before that, I was using Netscape Composer. :)
Anyway, now I have made an upgrade to.. ahem.. MS Frontpage 2002. Works like charm. ;)
I love it when my ad clients use these "create html" quasi-utilities. It means they'll be buying traffic from me for a looooooong time.
Just yesterday I had to clean up a file for a new client that had been made in Word - it was horrible! Just like the code you posted, only longer! Thankfully it was only one page! It doesn't surprise me that someone could end up in a mental institution from this.
What I see as a challenge though is how do you get through to a client that what they have is garbage? It all looks the same to them when they look at it in IE. So how do you explain to them that either you will have to start all over making their site, or have to edit the garbage that Word wrote, and that either way it will cost them some money. They just don't seem to get it, like: 'the site is basically there already, it should be real easy for you to just change this or that. Their friend who made the site the first time could easily add whatever, just by opening it in word...etc.'.
(I need a better way to make money!)
"how do you get through to a client that what they have is garbage?..."
LOL! That is SO true.
A while back a small non-profit org asked me to "help" with their website.
It had been done in a combination of FP98, FP200, FTP, Word, and a REALLY old copy of Fusion, and probably some other stuff and text editors that I did not look for. All mixed together. All done by a succession of people that obviously had no concept of what they were doing.
When I told them what needed to be done (such as removing or scaling down the 560kb graphics background) I started getting the "well, we want to keep that, and this, and don't change that.."
I finally told them it was hopeless given the limitations they gave me (and the obvious internal bickering that was going on).
All this for a site that got maybe 500 hits a month ;P
oohhhh, so that's what that stuff is.......
A buddy of mine put a little site online ad it was full of code like that, cept' the word idenifiers were removed. He was a PhD Enginering grad so I thought it was some obscure coding/formatting thing used in Engineering......and it was Word all along........
Just as a matter of comparison:
OpenOffice.org blank page
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=windows-1252">
<META NAME="GENERATOR" CONTENT="OpenOffice.org 1.0.1 (Win32)">
<META NAME="CREATED" CONTENT="20030222;18070329">
<META NAME="CHANGED" CONTENT="16010101;0">
Mozilla Composer -
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
Dreamweaver MX 'Basic webpage' -
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
They all appear to do quite a good job of it. Personaly, I can't stand using M$ Office at all anymore, long live Open Office!
|brotherhood of LAN|
hmmm, i think this is why frontpage creates a little "extra" code here and there (not as much as word, for sure!).
All MS office programs seem to be interoperable..ie save htm as xls, xls as cvs etc etc.
I guess any form of "HTML tidy" would have a hard time ironing all all the code.
just don't use word anylonge for this. i'm just using staroffice (not open office, but quite the same), and the html is also packed with stylesheet stuff etc. . i don't like it either. maybe dreamweaver really is a solution for converting longer documents out of a wordprocessor.
The solution I use - copy all the text out of the page to a .txt file and start again.
It's so much easier.
Alternatively you could try Dean Allen's Word HTML cleaner which (when it works) is excellent!
[edited by: lawman at 12:45 am (utc) on Feb. 27, 2003]
[edit reason] delinked [/edit]
Before I started to working for my current employer the company intranet was a 600 page word hell :(
Layout was good but the code sucked.. its all good now though :)
A non-Windows reply?
I copy-and-pasted the first mentioned HTML into a blank *"Simpletext" document and saved it as an HTML file on my Mac. I've done this before, including when I was creating my own webpage, and it workd quite well.
Then, once saved as an HTML file, I tried opening it with either Netscape 4.79 or iCab 2.82.
All I get is a blank white screen in both browsers. Sorry, the code doesn't work on us with iconoclastic, non-Windows platforms.
(*"Simpletext" is a modestly powered word processing program that comes with all Macs. It's pretty versatile and by default it only uses 512 k of RAM (you can tweak it to whatever you want of course). It handles different text styles, different fonts, search and replace, as well as (through linking with Quicktime) being able to play movies and sounds, even speaking typed text. But as far as handling plain text is concerned, it is akin to generating plain .TXT files on a Windows system with Notepad or some other program. The only thing it won't do is read Word .DOCs. For that I use another program called "Fileview" to rip the text out of ANY file, sans formatting. It's pretty well the only word processor I think of using nowadays.)