Forum Moderators: open
var data_match = /(<[^>]+)(?:data-[\w-]+|role|cite|itxt[\w-]*)=("|')[\s\S]+?\2([^>]*>)/gim; My instinct is to change data-quote and data-em to data%2Dquote and data%2Dem, respectively, then run the regex above, then switch the %2D back to a hyphen.You’d be surprised at how often it really does end up simplest to go two steps forward and one back.
replace(/<a href=("|')(\S+)\1[^>]*>[\s\S]+</a>/gi, '$2') var a = 'www.example.com';
var www_match = /((?:^|[^<]|>)[^<>]*?[^\b\w]*?)(www\.\S+(\b|$))/gi;
a = a.replace(www_match, 'https://$2');
var link_match = /((?:^|[^<]|>)[^<>]*?)(https?:\/\/\S+)(\b|$)/gi;
a = a.replace(link_match, '<a href="$2" target="_blank">$2</a>'); (?:^|[^<]|>)I think an extra pipe sneaked in there. Wasn’t this the one where the object is to find strings only outside of <markup>? If so it needs to be (?:^[^<]|>)i.e. start each fresh search at the very beginning of your test string--but only if it doesn't start right in with <markup>, hence the ^[^<] locution--and then every time the <markup> closes.
var www_match = /((?:^|[^<]|>)[^<>]*?[^\b\w\/]+?)(www\.\S+(\b|$))/gi; var www_match = /((?:^|[^<]|>)[^<>]*?(?<!https?:\/\/))(www\.\S+(\b|$))/gi;
(?:^|[^<]|>)because then what happens if the very first thing in your test string is <markup>? You'd then be capturing exactly what you don't want to capture. The [^<>] is meant to protect you, but do some further experimenting to make sure it really does. Can you use ? in a lookbehind? (SubEthaEdit won’t let me: I have to say separately http:// and https:// because a lookbehind, unlike a lookahead, has to be of fixed length. Could be worse: some flavors require everything to be of fixed length.)
(?:(?<!https)|(?<!http):\/\/) I remember we talked before about lookbehinds in Javascript; I was under the impression you can’t use them [regular-expressions.info] (if the fragment gets eaten, scroll down to Important Notes). In practice I guess that means: be sure to try it out in MSIE < current-version to be safe.
I remain uneasy about this
(?:^|[^<]|>)
because then what happens if the very first thing in your test string is <markup>? You'd then be capturing exactly what you don't want to capture. The [^<>] is meant to protect you, but do some further experimenting to make sure it really does.
When a Regular Expression itself contains literal slashes, I sometimes find it more readable to use the “new RegExp” formulation instead. (If it contains both slashes and quotation marks, you’re SOL ;) )
I think (\b|$) may be redundant, because “end of string” itself counts as a \b.
// Remove all targets; I'm mainly working with pasted data that can originate from
// other sites, so I don't want to accidentally have a target that conflicts with
// something I'm already using
var target_remove_match = /(<a[^>]*) target=("|')\w*\2/gi;
a = a.replace(target_remove_match, '$1');
// Replace all links with a placeholder; it's notable that I get a warning on the
// "function", though, and while I think it can be safely ignored, it's worth knowing:
// Functions declared within loops referencing an outer scoped variable may lead to
// confusing semantics
var remove_link = /(<a[^>]+>[\s\S]+?<\/a>)/gi;
var save_link = [];
var x = 0;
while (remove_link.test(a))
a = a.replace(remove_link, function($match, $1) {
x++;
save_link[x] = $1;
// this could be anything unique as long as it has the x variable somewhere. Since
// I'm not using <q> for emojis anymore, I'm thinking about using '<q>' + x + '</q>'
// as a placeholder instead of '::chr(' + x + ')::'
return '::chr(' + x + ')::';
});
// Convert www to https://www; do I need to allow for more than ^|>|\s?
var www_match = /(^|>|\s)(www\.\S+\b)/gi;
a = a.replace(www_match, '$1https://$2');
// Add A element
var link_match = /(^|>|\s)(https?:\/\/(\S+))\b/gi;
a = a.replace(link_match, '$1<a href="$2">$3</a>');
// Reinstate links
var reinstate_link = /::chr\(([0-9]+)\)::/gi;
while (reinstate_link.test(a))
a = a.replace(reinstate_link, function($match, $1) {
return save_link[$1];
});
// Add _blank target to all A elements
var target_match = /(<a [^>]*)>/gi;
a = a.replace(target_match, '$1 target="_blank">'); When a lookaround involves multiple options, put the whole thing in the same lookaround markup, and separate the options with a pipe:(?:(?<!https)|(?<!http):\/\/)
(?<!http|https)
It looks unnerving when you're not used to it, but that's the syntax. The concept is to replace all <a...></a> tags with something unique, and instead save them to an array.This sounds like another version of when you throw up your hands and decide that “two steps forward, one back” works out to less trouble in the long run.
When a lookaround involves multiple options, put the whole thing in the same lookaround markup, and separate the options with a pipe
This sounds like another version of when you throw up your hands and decide that “two steps forward, one back” works out to less trouble in the long run.