Forum Moderators: open

Message Too Old, No Replies

Regular Expression to strip comments from Javascript

         

littlegiant

5:35 pm on Aug 1, 2007 (gmt 0)

10+ Year Member



I'm at the end of my rope. After trying about 10 different Javascript obfuscators and finding every single one of them buggy, too aggressive (e.g., removed semicolons needed to close lines) or just plain didn't work, I've now resorted to working up my own simple obfuscation system part of which involves using a simple form to strip out Javascript one-liner comments such as:

// comment here

I'm having problems with the regular expression needed to do this. Here's what I'm using:

/^[\/]{2}.*[\n]$/g

According to my notes and Javascript manuals, this should search for any string that begins with two forward slashes, followed by any number of other characters and then terminates with a line break.

The results I'm getting are:

// test [no line break] = // test (GOOD)
// test [line break] = [nothing] (GOOD)

// test [line break] keep me = // test [line break] keep me (NO GOOD)

What's wrong with my regular expression? I don't get it. I'm climbing the walls here.

Just for posterity, here's the test case scenario-

<html>
<head>
<script type="text/javascript">
function stripComments () {
var inputCode = document.getElementById("inputCode").value;
var regex_sc1 = /^[\/]{2}.*[\n]$/g;
var noComments = inputCode.replace(regex_sc1,"");
document.getElementById("outputCode").value = noComments;
}
</script>

</head>
<body>

<form action="">
INPUT CODE:<br>
<textarea id="inputCode" rows="10" cols="50"></textarea><br>
<button type="button" onclick="stripComments();">Strip Comments</button><br>
<br>
OUTPUT CODE:<br>
<textarea id="outputCode" rows="10" cols="50"></textarea><br>
<input type="reset">
</form>

</body>
</html>

jaw76

6:07 pm on Aug 1, 2007 (gmt 0)

10+ Year Member



Hey.

Try this instead:

var regex_sc1 = /([\n][\/]{2}.*[\n]?)¦(^[\/]{2}.*[\n]?)/g;

J

littlegiant

6:41 pm on Aug 1, 2007 (gmt 0)

10+ Year Member



Uhhh.. well thanks anyway but that didn't work. The results of that were:

// test [line break] = // test [line break] (NO GOOD)
// test [line break] keep me = // test [line break] keep me (NO GOOD)

jaw76

7:20 pm on Aug 1, 2007 (gmt 0)

10+ Year Member



Ok, that wasn't the best suggestion. =)

If I've understood you correctly, you only want to remove comments in the beginning of a line.

This works fine for me:

var regex_sc1 = /(^[\/]{2}[^\n]*)¦([\n]{1,}[\/]{2}[^\n]*)/g;

The second alternative matches // preceded by a newline. The first alternative takes care of the case when the first line is a comment. In this (and only this) latter case a newline will remain though.

J

littlegiant

8:03 pm on Aug 1, 2007 (gmt 0)

10+ Year Member



That doesn't work either. Same results as before. What I want to do is remove ALL instances of strings that begin with two forward slashes, have any text in between and that terminate with a line break (new line)-- which in effect is a Javascript comment line.

jaw76

8:18 pm on Aug 1, 2007 (gmt 0)

10+ Year Member



Strange,

I copied your html, changed the regex and it works:

// test [no line break] = [nothing]
// test [line break] = [line break]
// test [line break] keep me = [line break] keep me
// test [line break] keep me [line break] // test [line break] and me = [line break] keep me [line break] and me

littlegiant

8:35 pm on Aug 1, 2007 (gmt 0)

10+ Year Member



I don't know what to say. I just did the exact same thing. I copy-and-pasted my own html from this post into a text editor, saved it as a completely new .htm file. Then I swapped out the var reg_exp_sc1 for the one you supplied in your last post. Saved the file and then ran the tests. I'm not getting your results at all. With your regular expression, everything that gets put into the INPUT CODE box gets copied verbatim to the OUTPUT CODE box, regardless of how many line breaks or whatever I put in. How could this be? It couldn't be a browser or an OS issue, could it?

jaw76

8:54 pm on Aug 1, 2007 (gmt 0)

10+ Year Member



I've noticed that this site displays a pipe as a a broken bar (¦). If you copied my regex and didn't change the broken bar back to a pipe that is most likely the reason why it won't work.

If that's not the case I don't know what to say either...

littlegiant

8:58 pm on Aug 1, 2007 (gmt 0)

10+ Year Member



That was it! Working now... (D'oh!... I should have caught that..) Much thanks for your help!

jaw76

9:03 pm on Aug 1, 2007 (gmt 0)

10+ Year Member



Glad I could help!

Even though it annoys me that a new line is left when removing a comment on the first line. Let me know if you come up with a better solution.

J

littlegiant

9:12 pm on Aug 1, 2007 (gmt 0)

10+ Year Member



I'm glad it works but I'm having problems understanding why it works. Your regular expression..

/(^[\/]{2}[^\n]*)¦([\n]{1,}[\/]{2}[^\n]*)/g;

..as I understand it means, search for a double forward slash at the beginning and then any number of new line characters also at the beginning (of what?) which is evaluated before the former OR search for a new line character 1 or more times followed by a double forward slash followed by any number of new line characters at the beginning of something again (huh?)... (*chuckle*)... Sheesh... works fine but what a mind boggler!

littlegiant

9:18 pm on Aug 1, 2007 (gmt 0)

10+ Year Member



The part I don't understand is how this regular expression accounts for any of the characters between the double forward slash and the new line character. How does it do that?

jaw76

9:27 pm on Aug 1, 2007 (gmt 0)

10+ Year Member



Hehe... it's probably not the most aesthetic solution. =)

The ^[\/]{2}[^\n]*) matches a double forward slash in the beginning of the string, followed by any characters except new line (placed inside a character set ^ negates the subsequent characters, ie [^\n] means ANY character BUT new line.

The [\n]{1,}[\/]{2}[^\n]* part matches one or more new line followed by a double forward slash, followed by any characters except new line.

littlegiant

9:38 pm on Aug 1, 2007 (gmt 0)

10+ Year Member



Woops... I overlooked that part about negated character sets in my Javascript manual. Okay I get it now. That's a really interesting approach. I never would have thought of that. I think I was trying to apply too much brute force. Anywho... However aesthetic it may or may not be, it works! Thanks again!