homepage Welcome to WebmasterWorld Guest from 54.167.144.202
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
Forum Library, Charter, Moderators: coopster & jatar k & phranque

Perl Server Side CGI Scripting Forum

    
Simple Regex expression
Need to break on comma, but not on those between { and }
sirhall

5+ Year Member



 
Msg#: 3851521 posted 3:43 pm on Feb 17, 2009 (gmt 0)

I have a case where I need to break up a string based on commas, but not where the commas are between { and }. So for example the string 1,2,3,{3a,3b}, 4, {4a, 4b, {4b, 4b2}, 4c}, 5

Would return as:
1
2
3
{3a,3b}
4
{4a, 4b, {4b, 4b2}, 4c}
5

 

krugs

5+ Year Member



 
Msg#: 3851521 posted 10:57 pm on Feb 17, 2009 (gmt 0)

hmmm... a difficult one.

janharders

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3851521 posted 10:35 am on Feb 18, 2009 (gmt 0)

maybe I'm just lazy, but that doesn't look like a regexp-case to me.
if you know the format of the elements will always be \d[a-z], you could go with something lazy like: replace the commas in the curly brackets with something else that will not be found anywhere in the string, say a semicolon. then split on the commas and turn the semicolons back into commas.
other than that, you'd have to go with "normal" parsing, I'm afraid. Or maybe I'm just not looking hard enough, which is also not unlikely.

krugs

5+ Year Member



 
Msg#: 3851521 posted 6:07 pm on Feb 18, 2009 (gmt 0)

You could go that route for sure, or maybe Text::Balanced could sort it out. Not sure though as I am not too familiar with the module and the documentation is quite long. The "lazy" way would be to code a bit of a parser for that particular type of string.

coopster

WebmasterWorld Administrator coopster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3851521 posted 12:05 am on Feb 19, 2009 (gmt 0)

I'm thinking a regular expression with an assertion using recursion might work. Basically you match one or more of a recursive (nested) pattern and I followed it up with either a comma or end of string. You'll have to test it thoroughly but on first run it works for me.
/({(?:(?>[^{}]+)¦(?R))*}¦[^{},]+)(?:,¦\s*$)/

krugs

5+ Year Member



 
Msg#: 3851521 posted 4:29 am on Feb 19, 2009 (gmt 0)

what does the (?R) do?

Key_Master

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3851521 posted 6:08 am on Feb 19, 2009 (gmt 0)

Try this:

$test =~ s/({[^}]*[^{]*})*,/$1\n/g;

[edited by: Key_Master at 6:17 am (utc) on Feb. 19, 2009]

janharders

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3851521 posted 6:59 am on Feb 19, 2009 (gmt 0)

/({(?:(?>[^{}]+)(?R))*}[^{},]+)(?:,\s*$)/

that's exactly the reason why people dislike perl, you're never sure if that's what coopster wrote or if the board software scrambled his input ;)

I'll have to check that out once my caffeine-level reaches operational state.

[edited by: phranque at 7:33 am (utc) on Feb. 19, 2009]
[edit reason] fix graphic smiley in the regexp :( [/edit]

coopster

WebmasterWorld Administrator coopster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3851521 posted 12:36 pm on Feb 19, 2009 (gmt 0)

The only thing that gets scrambled in that regular expression is the broken pipe. Every time you see a broken pipe, you can pretty much assume it is supposed to be the pipe character. If you copy/paste that regular expression to test it, rekey the pipe character.

The ?R is the recursion. I'm checking for one or more recursive patterns inside braces OR one or more of anything that is not an opening brace, closing brace, or comma. That is then followed by either a comma or zero or more space characters and the end.

?: is the instruction to not capture a subpattern.

I don't recall which version of the engine introduced recursion, but I've used it on a number of occasions. It takes a few minutes to wrap your head around but you will certainly appreciate it's usefulness once you do.

krugs

5+ Year Member



 
Msg#: 3851521 posted 6:19 am on Feb 20, 2009 (gmt 0)

The ?R is the recursion.

Ahhh..... I see now. Pretty cool. Learned something new.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved