Welcome to WebmasterWorld Guest from 54.162.93.137

Forum Moderators: coopster & jatar k & phranque

Message Too Old, No Replies

Simple Regex expression

Need to break on comma, but not on those between { and }

     
3:43 pm on Feb 17, 2009 (gmt 0)

5+ Year Member



I have a case where I need to break up a string based on commas, but not where the commas are between { and }. So for example the string 1,2,3,{3a,3b}, 4, {4a, 4b, {4b, 4b2}, 4c}, 5

Would return as:
1
2
3
{3a,3b}
4
{4a, 4b, {4b, 4b2}, 4c}
5

10:57 pm on Feb 17, 2009 (gmt 0)

5+ Year Member



hmmm... a difficult one.
10:35 am on Feb 18, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



maybe I'm just lazy, but that doesn't look like a regexp-case to me.
if you know the format of the elements will always be \d[a-z], you could go with something lazy like: replace the commas in the curly brackets with something else that will not be found anywhere in the string, say a semicolon. then split on the commas and turn the semicolons back into commas.
other than that, you'd have to go with "normal" parsing, I'm afraid. Or maybe I'm just not looking hard enough, which is also not unlikely.
6:07 pm on Feb 18, 2009 (gmt 0)

5+ Year Member



You could go that route for sure, or maybe Text::Balanced could sort it out. Not sure though as I am not too familiar with the module and the documentation is quite long. The "lazy" way would be to code a bit of a parser for that particular type of string.
12:05 am on Feb 19, 2009 (gmt 0)

WebmasterWorld Administrator coopster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I'm thinking a regular expression with an assertion using recursion might work. Basically you match one or more of a recursive (nested) pattern and I followed it up with either a comma or end of string. You'll have to test it thoroughly but on first run it works for me.
/({(?:(?>[^{}]+)¦(?R))*}¦[^{},]+)(?:,¦\s*$)/
4:29 am on Feb 19, 2009 (gmt 0)

5+ Year Member



what does the (?R) do?
6:08 am on Feb 19, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Try this:

$test =~ s/({[^}]*[^{]*})*,/$1\n/g; 

[edited by: Key_Master at 6:17 am (utc) on Feb. 19, 2009]

6:59 am on Feb 19, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



/({(?:(?>[^{}]+)(?R))*}[^{},]+)(?:,\s*$)/

that's exactly the reason why people dislike perl, you're never sure if that's what coopster wrote or if the board software scrambled his input ;)

I'll have to check that out once my caffeine-level reaches operational state.

[edited by: phranque at 7:33 am (utc) on Feb. 19, 2009]
[edit reason] fix graphic smiley in the regexp :( [/edit]

12:36 pm on Feb 19, 2009 (gmt 0)

WebmasterWorld Administrator coopster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



The only thing that gets scrambled in that regular expression is the broken pipe. Every time you see a broken pipe, you can pretty much assume it is supposed to be the pipe character. If you copy/paste that regular expression to test it, rekey the pipe character.

The ?R is the recursion. I'm checking for one or more recursive patterns inside braces OR one or more of anything that is not an opening brace, closing brace, or comma. That is then followed by either a comma or zero or more space characters and the end.

?: is the instruction to not capture a subpattern.

I don't recall which version of the engine introduced recursion, but I've used it on a number of occasions. It takes a few minutes to wrap your head around but you will certainly appreciate it's usefulness once you do.

6:19 am on Feb 20, 2009 (gmt 0)

5+ Year Member



The ?R is the recursion.

Ahhh..... I see now. Pretty cool. Learned something new.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month