Forum Moderators: open

Message Too Old, No Replies

pattern.test(), how to refer back to pattern

RegExp.$n is deprecated

         

csdude55

7:43 pm on Dec 6, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Let's say that I have something like this:

var patt = /pat+ern/i;
while (patt.test(a))
a = a.replace(patt, 'replacement');

I know, I know, I hate using a while() too because of the potential infinite loop. In practice i would build in a safety net, I'm just trying to keep it simple for the post.

Originally I could have shortened it to:

while (/(pat+ern)/i.test(a))
a = a.replace(RegExp.$1, 'replacement');

which I liked! But I see that RegExp.$n is deprecated:

[developer.mozilla.org...]

I like to avoid using unnecessary variables, so before I resort to using the "var patt" version, is there a replacement for RegExp.$n?

lucy24

8:21 pm on Dec 6, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Funny, I've never used anything but the named-variable version.

But--the usual tangent--why can't you use a /g and do it all in one fell swoop? This works perfectly well* with the RegEx, as in
output = output.replace(/\./g,"\\.")
(Example picked purely because it's the first replace in my logscripts.js file.)

* At least it did two days ago when I last ran my logs, which I do in javascript because my sites are just that small.

csdude55

9:00 pm on Dec 6, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've never figured out the logic, but sometimes /g just doesn't work and I have to use a loop. Like this one to limit repeated characters to a max of 6 repeats:

var limit_match = /((?:^[^<]|>)[^<>]*?)([^<>])\2{6,}/;
while (limit_match.test(a))
a = a.replace(limit_match, '$1$2$2$2$2$2$2');

Fotiman

2:42 pm on Dec 7, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yeah, the while loop is inefficient and shouldn't be needed, unless you're trying to prevent something like this, where the result of a replacement creates a new instance of the string you want to replace (because of the surrounding values):
Source: "bababa" -> replace /ba/g with "ab" -> "ababab" (this result string now still contains the value that we wanted to replace: "ba")
As opposed to replacing them one at a time, reevaluating the entire string with each iteration:
"bababa" -> "abbaba" -> "ababba" -> "aabbba" -> "aabbab" -> "aababb" -> "aaabbb"

But yeah, storing the RegExp in it's own variable is a simple and more declarative solution, that makes it more obvious what your code is doing (your first example is much more clear than the RegExp.$n example). It's not really an unnecessary variable, since you need to reference it in multiple places, both when you perform the test, and when you perform the replacement. I'd just make it a `const` instead of a `var` (if you're still using the loop).

Likewise, your limit match example could be done like this:

function limitRepeatedChars(str) {
// Use a regular expression to match any sequence of 6 or more repeated characters
return str.replace(/(.)\1{6,}/g, (match) => {
// Replace the matched sequence with the first 6 characters
return match.substring(0, 6);
});
}

csdude55

9:34 pm on Dec 7, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yeah, the while loop is inefficient and shouldn't be needed, unless you're trying to prevent something like this, where the result of a replacement creates a new instance of the string you want to replace (because of the surrounding values):

As I've been playing, I think I've figured out the necessity of the while. Another example is this, where I'm removing tracking params from A HREF values:

// the link
a = `<a href="http://www.foo.com?utm_foo=bar&fbid=junk&id=123&parm=789">foo</a>`;

// remove utm_foo, ocid, trkid, gclid, fbid, data-bar, role, cite, itxt
//
// first version, using /g and no while()
// starts with "<a", followed by anything or nothing until it gets to "href=", followed by
// either a " or ', followed by anything that's not that matching " or ' until it gets to a ?,
// then followed by anything that's not that matching " or ' until it gets to a & (with an
// optional amp;)
var tracker_match = /(<a[^>]* href=("|')[^\2]+\?[^\2]*)(?:(?:&(?:amp;)*)?utm_\w+?|ocid|trkid|gclid|fbid|data-[\w-]+|role|cite|itxt[\w-]*)=\w+/gi;

a = a.replace(utm_match, '$1')
.replace(/(&(?:amp;)*)+/g, '&')
.replace(/\?(&(?:amp;)*)+/g, '?')
.replace(/(?:&(?:amp;)*)+("|')/g, '$1');

// result; removes fbid=junk, but not utm_foo=bar
// <a href="http://www.foo.com/?utm_foo=bar&id=123&parm=789">foo</a>

//////////
// second version, using while() and no /g
var tracker_match = /(<a[^>]* href=("|')[^\2]+\?[^\2]*)(?:(?:&(?:amp;)*)?utm_\w+?|ocid|trkid|gclid|fbid|data-[\w-]+|role|cite|itxt[\w-]*)=\w+/i;

while (tracker_match.test(a))
a = a.replace(tracker_match, '$1')
.replace(/(&(?:amp;)*)+/g, '&')
.replace(/\?(&(?:amp;)*)+/g, '?')
.replace(/(?:&(?:amp;)*)+("|')/g, '$1');

// result is as expected
// http://www.foo.com/?id=123&parm=789

But what's weird to me is that if I don't restrict it to following the <a href=, then this works without a loop:

a = a.replace(/(\?|&(?:amp;)*)(?:utm_\w+?|ocid|trkid|gclid|fbid|data-[\w-]+|role|cite|itxt[\w-]*)=\w+/gi, '$1')
.replace(/(&(?:amp;)*)+/g, '&')
.replace(/\?(&(?:amp;)*)+/g, '?')
.replace(/(&(?:amp;)*)+$/gm, '');


Likewise, your limit match example could be done like this:

I'm specifically trying to exclude text within a tag, but this had the same problem without being in a loop:

a = a.replace(/((?:^[^<]|>)[^<>]*?)(([^<>])\3{6,})/g, 
(match, $1, $2) => {
return $1 + $2.substring(0, 6)
});

Using a = 'thiiiiiiiiiis aaaaaaaaaaand thatttttttttttt';, it returned:

thiiiiiis aaaaaaaaaaand thatttttttttttt

So it worked on the first catch, but then didn't go on to the second or third.

Fotiman

3:21 pm on Dec 8, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



With your last example, the problem is within the first capture group of your regex. For example, if you call limitRepeatedChars('thiiiiiiiiiis aaaaaaaaaaand thatttttttttttt'), it behaves correctly and returns 'thiiiiiis aaaaaand thatttttt'.

(?:^[^<]|>)

This will only match starting at the beginning of the string.

csdude55

4:49 pm on Dec 8, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Awesome catch, thanks! I swear that I looked at that ^ a thousand times, but my eyes are just fried from staring at code all day :-/

You're right, simply removing that ^ fixed it. I appreciate that!

For future readers, I found a fairly simple solution to the original question about replacing RegExp.$1. This modification works fine:

while (($1 = /pat+ern/i).test(a))
a = a.replace($1, 'replacement');

It doesn't REALLY fulfill my desire of using a built-in variable instead of creating a new one, but at least it looks similar enough to the original. And is the same number of characters, so not even a microsecond longer to load.

Awhile back I learned how to use $_ in Perl, and realized that I'd been creating unnecessary variables for YEARS; essentially just renaming built-in variables. I started eliminating all of those and have improved my load time a bit, so it's become a habit now to minimize unnecessary variables.

Fotiman

5:56 pm on Dec 8, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Note, your example there is creating a new global variable named `$1`, which is probably not desirable.

csdude55

6:21 pm on Dec 8, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"Global" meaning that it would be accessible elsewhere throughout the script, right?

I originally tried while ((let $1 = ...)...) { ... }, but of course that throws a warning. I guess because it keeps trying to reinitialize it? I'm not sure, the warning was vague.

In my case this really isn't an issue, but for others I guess maybe:

let $1;
while (($1 = /pat+ern/i).test(a))
a = a.replace($1, 'replacement');

At that point, though, it's so similar to the original that you might as well use this for readability:

let $1 = /pat+ern/i;
while ($1.test(a))
a = a.replace($1, 'replacement');

Fotiman

7:50 pm on Dec 8, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Right, global meaning you're adding something to the global scope (as opposed to a function scope, or block scope), accessible elsewhere throughout the script, AND could potentially impact other scripts (if they were to also make use of a global variable by that name).