Welcome to WebmasterWorld Guest from 3.85.214.0

Forum Moderators: open

Message Too Old, No Replies

Guidance needed for loading content and eliminating bad code

     
11:47 pm on Nov 26, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member whitey is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 28, 2005
posts:3072
votes: 27


Some time ago we had 1,000's of pages of unique content written and loaded into our bespoke CMS for publication on to our website. Fast forward 6 years, and we're going through an upgrade to a new system and need to re enter all of that content again.

Although I've opened another thread regarding Wordpress : [webmasterworld.com...] , this has nothing to do with Wordpress this time. Our CMS [ if you can call it that - since it is heavily minimal ] has a WYSWYG and Source Code view.

So far, the process to edit and edit the content has been:

The process:
Open article in MS Word
Edit it in MS Word [ adding new content elements]
Cut and paste article to Notepad
Re enter it back onto MS Word
Strip article of any formatting [ using unformatting option ]
Bold paragraph section lines e.g. this para is about xyz
Enter into WYSWG

The problem:
The content added to the WYSWG shows "Word related" code that bloats the source code
All that content needs to be either re-entered, or edited in the source code area
It's a headache and time consuming.

To all you great people out there who do this regularly, what is the best way, or what are the practice options to managing document transfer, without programming, from a previous system to a new one, when the preceding storage/edit options are Word, and/or the old HTML pages
2:03 am on Nov 27, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15869
votes: 869


I don't understand why the MS Word steps are necessary at all. I mean, ahem, in the specific context of editing HTML. Wouldn't it be simpler all around to do everything in a bare-bones text editor?
7:58 am on Nov 27, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member whitey is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 28, 2005
posts:3072
votes: 27


@lucy24 - Can you elaborate?

The source articles that are giving us the grief are in Word. 1,000's of them - and they're pretty long articles. Using the text editor doesn't remove the Word related code once it's been put in the text editor. It may look ok in the text editor, but the source code is bloated. I hope this makes sense.
11:37 am on Nov 27, 2015 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member andy_langton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 27, 2003
posts: 3332
votes: 140


If you've got 1000s of Word documents, you could batch convert them into something else (e.g. filtered HTML, which is the 'cleanest' output from Word) using a macro. If it were me, I would then crawl all of the resulting HTML and put it in a format which could be inserted into the CMS. The do all the editing via your WYISWYG, not in Word.

Incidentally, many WYSIWYGs include a "paste from Word" option, and it's relatively straightforward to clean up MS word code using scripting.
12:27 am on Nov 28, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


IMO Word and many WYSIWYG editors have shortcomings and some actually create incorrect code which *may* look OK to you but if the code is not Standard Compliant, the desired look/function may fail in the many browsers used on the internet.

Code needs to be Standard Compliant. If not, the browser goes into Quirks Mode and may do some unexpected things guessing what you intended. The single most important way to ensure your web pages look & function relatively the same across the internet is to write Standard Compliant code.

I always hand code, using a bare-bones text editor, then save as HTML. I then upload to my server and the very first thing I do is check it with the W3C HTML Validator [validator.w3.org...] (there are also browser add-ons to check with W3C.) These are the people who write the standards that the browsers use to render page mark-ups.

If the code fails the W3C tool, it will tell you where and give a couple common reasons why the code is incorrect. This is a valuable learning experience and probably the most significant influence I've had in learning to code.
4:23 am on Dec 30, 2015 (gmt 0)

Junior Member

joined:Aug 3, 2013
posts: 113
votes: 32


In my experience HTML exported from Word is god-awful. While it retains it's appearance very well the source code is a nightmare.

It seems to work pretty well to paste from Word into the WYSIWYG window in dreamweaver. You might try pasting into a text editor as RTF. I have good luck going from RTF to HTML in general. You might also try opening the Word docs in OpenOffice instead of opening them in Word to see if you get a less annoying result.

I just did a quick test & the HTML exported by OpenOffice was far more "normal" than the code from Word. The copy/paste from OpenOffice into Dreamweaver didn't produce any bad results at all. It seemed as brief and appropriate as one could expect.
11:27 pm on Jan 6, 2016 (gmt 0)

Full Member

5+ Year Member

joined:Apr 26, 2012
posts:328
votes: 8


Whitey, are your articles originating in MS Word? If so, that may explain why Word has to be used here.
11:49 am on Jan 7, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member whitey is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 28, 2005
posts:3072
votes: 27


Yes the are.

I took some advice here and used the system text editor and notepad and back into the system. That seemed to clear the code issues quite well.