Forum Moderators: open

Message Too Old, No Replies

Need to Strip out HTML Comments

         

txbakers

12:44 am on Jan 8, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Glad to see the forums doing well!

Problem: Our web program uses the TINY-mce scripts to allow for HTML formatting of textareas. This works well for most things. We now applied it to sending HTML emails. When a person types in our box and uses the tiny-mce buttons, everything is fine.

However, when people type in Word and format their message, and then copy/paste into ours, it includes a variant of the following:

<!-- /* Font Definitions */ @font-face {font-family:Wingdings; panose-1:5 0 0 0 0 0 0 0 0 0; mso-font-charset:2; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:0 268435456 0 0 -2147483648 0;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-parent:""; margin:0in; margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Times New Roman"; mso-fareast-font-family:"Times New Roman";} @page Section1 {size:8.5in 11.0in; margin:1.0in 1.25in 1.0in 1.25in; mso-header-margin:.5in; mso-footer-margin:.5in; mso-paper-source:0;} div.Section1 {page:Section1;} /* List Definitions */ @list l0 {mso-list-id:1770925132; mso-list-type:hybrid; mso-list-template-ids:635851454 67698689 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;} @list l0:level1 {mso-level-number-format:bullet; mso-level-text:ï,·; mso-level-tab-stop:.5in; mso-level-number-position:left; text-indent:-.25in; font-family:Symbol;} ol {margin-bottom:0in;} ul {margin-bottom:0in;} -->

I tried to create a regexp to remove that, but it is not working:
output = String(output).replace(/<!--\b[^>]*>(.*?)-->/g, "");

So, I need your help!

THANKS.

Fotiman

3:33 pm on Jan 8, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Try this as your regexp:

/<!--(.*?)-->/g

txbakers

4:00 pm on Jan 8, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



will do!

txbakers

4:43 pm on Jan 8, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Very odd now. When I receive the email it doesn't have the format comments. But when I forward the email, it shows up immediately.

Not sure what to do here!

I'm stumped.

Fotiman

4:49 pm on Jan 8, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm not sure how that's even possible.

The only other thing that looks strange to me is this:
output = String(output).replace(/<!--\b[^>]*>(.*?)-->/g, "");

Should be:
output = output.replace(/<!--(.*?)-->/g, "");

txbakers

5:06 pm on Jan 8, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I did replace the old regexp with the new one and it still sent the gibberish in the HTML code.

txbakers

10:40 pm on Jan 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I put an alert on the "output" variable and it was blank, probably because of the HTML code.

But rather than sending the email, I wrote the body text to the screen and did a view source:

<p class="MsoNormal" style="margin: 0in 0in 0pt;"><span style="font-size: small; font-family: Times New Roman;">This is text.</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt;"><span style="font-size: small; font-family: Times New Roman;">This is more text.</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt;"><span style="color: red;"><span style="font-size: small;"><span style="font-family: Times New Roman;">This is red text.</span></span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt;"><span style="font-size: small; font-family: Times New Roman;">Here are some dots:</span></p>
<ul style="margin-top: 0in;" type="disc">
<li class="MsoNormal" style="margin: 0in 0in 0pt; mso-list: l0 level1 lfo1; tab-stops: list .5in;"><span style="font-size: small; font-family: Times New Roman;">Dot one</span></li>
<li class="MsoNormal" style="margin: 0in 0in 0pt; mso-list: l0 level1 lfo1; tab-stops: list .5in;"><span style="font-size: small; font-family: Times New Roman;">Dot two</span></li>
<li class="MsoNormal" style="margin: 0in 0in 0pt; mso-list: l0 level1 lfo1; tab-stops: list .5in;"><span style="font-size: small; font-family: Times New Roman;">Dot three</span></li>
</ul>
<p class="MsoNormal" style="margin: 0in 0in 0pt;"><span style="font-size: small; font-family: Times New Roman;">&nbsp;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt;"><span style="font-size: small; font-family: Times New Roman;">No more dots.</span></p>

So, that's the problem, all that extra formatting from Word needs to be stripped out and it needs to reformat.

Unless I tell people not to copy/paste from Word and just use our formatting tools....

Drag_Racer

9:09 pm on Jan 12, 2010 (gmt 0)

10+ Year Member



txbakers, you could consider replacing TINY-mce with CKEditor (I'm in the process of doing this right now) It comes with a 'Paste from Word' feature to cover these challenges.

on the regex, use /<!--[^>]*>/g or you may grab too much as regex are so greedy... ;)
also I would not use the prenthesis '(' & ')' unless you need to remember what you replace. It just extra processor/memory overhead unless to need it.

txbakers

2:35 am on Jan 13, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



paste From Word!~! OOOOH!

THANKS!

I'll look for that. CK Editor