Forum Moderators: open
<div id="whatever" data-foo style="margin: 50px" data-bar="bar" class="example" data-foo-bar="this is an example">an example</div> var a = $('#comment').html();
a = a.replace(/<([^>]*) data-[\S]+(=("|')\S+\3)*/gim, '<$1'); data-foo style="margin: 50px" data-bar="bar" class="example" data-foo-bar="this is an example"Assuming there are other possibilities, like name=blahblah or id=blahblah, each separate one is
(data(?:-\w+)+)(?: \w+)? ?= ?"[^"]*"
which you replace with whatever-it-is. But since you don't know how many there will be in any given statement, you'll need to put them inside a "while" loop: (( data(?:-\w+)+)(?: \w+)? ?= ?"[^"]*")+)
and it really is one fell swoop. But this won't work if there can be other stuff mixed in with the data-blahblah pieces. In your example, is "style" (following on a data-thingy) part of what you're getting rid of, while "class" (not following on a data-thingy) stays behind? <div data-foo="this is a big ol mess>this doesn't show because the tag is missing the ending double-quote</div> // this group matches /1, the ?: makes the -\w+ non-capturing. I get this
(data(?:-\w+)+)
// non-capturing the next \w, but why the space before \w or after the second
// question mark?
(?: \w+)?
// this one loses me... I don't understand what the ? before the = does. The
// second ? makes the second whitespace match 0 or 1 time, right?
?= ?
// I get this one; " followed by anything not a " until it gets to the next ". But
// mine gets more complicated, I can have " or ' so I use ("|') and then \1 to
// find the end
"[^"]*" I should probably not have tried answering this question so close to bedtime :(
(data(?:-\w+)+)(?: \w+)? ?= ?"[^"]*"
What you asked: why the space before \w
or after the second question mark?The first space is actually not after the question mark but before the = sign. This is a CYA kind of rule: it might be any of
I don't understand what the ? before the = does. The second ? makes the second whitespace match 0 or 1 time, right?
?= ?
But mine gets more complicated, I can have " or ' so I use ("|') and then \1 to find the endYes, I see. So it becomes
"[^"]*"
(['"])[^"'>]*\1
assuming you can be confident the quotation marks always match even if the second one may be absent entirely. The possible missing close-quote makes an entirely different pattern, though: (['"])[^"'>]*(?=>)
where the last thing inside the parentheses is a literal > character. But what happens if there's a missing quotation mark somewhere other than the last element inside the < > markup? It's definitely safe to say [^"'>] because then you know that, no matter what, your RegEx won't continue merrily capturing past the end of the markup. (['"])(?:[^"'>]*(?=>)|[^"'>]*\1)
though you can certainly try it and see if your code explodes. It means: after the first quote, continue until you either hit a matching quote, or something immediately followed by a close-bracket > (which can't be captured, so you have to use a lookahead).