Forum Moderators: coopster

Message Too Old, No Replies

Removing comments from html

         

GoodMoJo

12:54 am on Dec 31, 2003 (gmt 0)

10+ Year Member



I happen to have a bunch of stuff in a database (who here doesn't right?) and in one of the feilds I have a list of html snidbits that are deliniated by html style comments. I did this so the whole field could be displayed in an html page and the user would never see the delination. However the bossman is for whatever reason very parinoid about having comments in the end html.

Anyways enough backstory - I need to write a script that will take html comments out of a given string without removing other html tags. Anyone have any really smart ideas on an algorythm to do this?

P.S. - Thanks in Advance

coopster

1:14 am on Dec 31, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



You are probably going to end up using a regular expression. I would use preg-replace [php.net]. I haven't had any time to test this, but it should get you started:

$new_text = preg_replace("/<!--.*-->/Uis", "", $text_with_comments);

ergophobe

1:34 am on Dec 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This should do it (worked for me in a quick test:

$pattern = "/<!.*-->/m";
$test_array = preg_split($pattern, $test_html);
$text_without_comments = implode("", $test_array);

Of course, keep in mind that if you enclose your javascript within comments, it will blow that way too.


very parinoid about having comments in the end html.

What, are there passwords in there? Does it expose the underlying file structure or something like that? Otherwise, people have the HTML, what harm is there in letting them have the comments too?

Of course, you'll save some bandwidth, but running every page through the regex engine has to eat up some resources too.

ergophobe

1:41 am on Dec 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ahh! Coopster beat me to it with a simpler and better solution - using the U - Ungreedy flag instead of splitting and putting back together.

Oh well.

By the way, I tested my version and coopster's and they behave identically (and as desired) on the following text:

<p>This is a whole bunch of
of html text with <!-- some
multi-line comments --> in it
and other difficult situations.</p>

<p>This is a whole bunch of
of html text with <!-- some
multi-line comments --> in it
and other difficult situations.</p>

GoodMoJo

5:07 am on Dec 31, 2003 (gmt 0)

10+ Year Member



Thanks very much. who would have thought! preg replace... and i thought this was going to be hard. To answer earlier question: the bossman is paranoid about there being any sign that its from a database. why? don't even get me started...

httpwebwitch

9:21 pm on Jan 2, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Funny story.

I had a boss like that once. Someone showed him how to look at the source code of an HTML page, and he became paranoid that the "unreadable code" he saw somehow magically revealed mysteries of great importance to company security. He felt that the word "code" had a cold-war-esqe definition which involved espionage and subterfuge, like I could "crack the code".

He also thought that hackers were constantly trying to access our e-mail by using the magical invisble stuff hidden on our website pages. I was called to his office and grilled many times over to reveal the arcane knowledge that he was convinced I had which would let someone who understood HTML go into our database and read all the secret stuff in there... things like product names and prices (which were publicly displayed on the site anyways).

I didn't stay at that company very long... after I left, I heard that employees in the development studio started playing practical jokes, like changing his Windows startup sound into a recording from a Russian radio program.