Welcome to WebmasterWorld Guest from 54.196.217.43

Forum Moderators: coopster & jatar k & phranque

Message Too Old, No Replies

Simple Regex expression

Need to break on comma, but not on those between { and }

     
3:43 pm on Feb 17, 2009 (gmt 0)

New User

5+ Year Member

joined:Feb 17, 2009
posts: 1
votes: 0


I have a case where I need to break up a string based on commas, but not where the commas are between { and }. So for example the string 1,2,3,{3a,3b}, 4, {4a, 4b, {4b, 4b2}, 4c}, 5

Would return as:
1
2
3
{3a,3b}
4
{4a, 4b, {4b, 4b2}, 4c}
5

10:57 pm on Feb 17, 2009 (gmt 0)

Junior Member

5+ Year Member

joined:Dec 20, 2008
posts:92
votes: 0


hmmm... a difficult one.
10:35 am on Feb 18, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:May 31, 2008
posts:661
votes: 0


maybe I'm just lazy, but that doesn't look like a regexp-case to me.
if you know the format of the elements will always be \d[a-z], you could go with something lazy like: replace the commas in the curly brackets with something else that will not be found anywhere in the string, say a semicolon. then split on the commas and turn the semicolons back into commas.
other than that, you'd have to go with "normal" parsing, I'm afraid. Or maybe I'm just not looking hard enough, which is also not unlikely.
6:07 pm on Feb 18, 2009 (gmt 0)

Junior Member

5+ Year Member

joined:Dec 20, 2008
posts:92
votes: 0


You could go that route for sure, or maybe Text::Balanced could sort it out. Not sure though as I am not too familiar with the module and the documentation is quite long. The "lazy" way would be to code a bit of a parser for that particular type of string.
12:05 am on Feb 19, 2009 (gmt 0)

Administrator

WebmasterWorld Administrator coopster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:July 31, 2003
posts:12533
votes: 0


I'm thinking a regular expression with an assertion using recursion might work. Basically you match one or more of a recursive (nested) pattern and I followed it up with either a comma or end of string. You'll have to test it thoroughly but on first run it works for me.
/({(?:(?>[^{}]+)¦(?R))*}¦[^{},]+)(?:,¦\s*$)/
4:29 am on Feb 19, 2009 (gmt 0)

Junior Member

5+ Year Member

joined:Dec 20, 2008
posts:92
votes: 0


what does the (?R) do?
6:08 am on Feb 19, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 27, 2001
posts:1472
votes: 0


Try this:

$test =~ s/({[^}]*[^{]*})*,/$1\n/g; 

[edited by: Key_Master at 6:17 am (utc) on Feb. 19, 2009]

6:59 am on Feb 19, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:May 31, 2008
posts:661
votes: 0


/({(?:(?>[^{}]+)(?R))*}[^{},]+)(?:,\s*$)/

that's exactly the reason why people dislike perl, you're never sure if that's what coopster wrote or if the board software scrambled his input ;)

I'll have to check that out once my caffeine-level reaches operational state.

[edited by: phranque at 7:33 am (utc) on Feb. 19, 2009]
[edit reason] fix graphic smiley in the regexp :( [/edit]

12:36 pm on Feb 19, 2009 (gmt 0)

Administrator

WebmasterWorld Administrator coopster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:July 31, 2003
posts:12533
votes: 0


The only thing that gets scrambled in that regular expression is the broken pipe. Every time you see a broken pipe, you can pretty much assume it is supposed to be the pipe character. If you copy/paste that regular expression to test it, rekey the pipe character.

The ?R is the recursion. I'm checking for one or more recursive patterns inside braces OR one or more of anything that is not an opening brace, closing brace, or comma. That is then followed by either a comma or zero or more space characters and the end.

?: is the instruction to not capture a subpattern.

I don't recall which version of the engine introduced recursion, but I've used it on a number of occasions. It takes a few minutes to wrap your head around but you will certainly appreciate it's usefulness once you do.

6:19 am on Feb 20, 2009 (gmt 0)

Junior Member

5+ Year Member

joined:Dec 20, 2008
posts:92
votes: 0


The ?R is the recursion.

Ahhh..... I see now. Pretty cool. Learned something new.