homepage Welcome to WebmasterWorld Guest from 54.166.66.204
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
group similar data in a string?
ktsirig




msg:3635470
 10:20 pm on Apr 25, 2008 (gmt 0)

Hello all
I didn't know a good way of naming my question, so I am getting straight to the point:
I have a string like the following:

-----MMMMM------IIIII----MMM---OOOO---I---MMMM-----

Is there any way I can gather and group my information and get, for example:

6-10:M
17-21:I
26-28:M
32-35:O
39:I
43-46:M

Thank you in advance.

 

coopster




msg:3635910
 9:54 pm on Apr 26, 2008 (gmt 0)

The pattern you require will depend on what you are trying to locate in the string but I guess I would probably use preg_match_all [php.net] with some optional offset flags. It sounded like you want to get only repeated letter patterns so although a simple [a-z]+ pattern would work on your original string it would fail on a string where grouped characters are not all the same. Note my addition to clarify:
//$subject = '-----MMMMM------IIIII----MMM---OOOO---I---MMMM-----'; 
//$pattern = '/[a-z]+/i'; // Works fine on the original subject string
$subject = '-----MMMMM------IIIII----MMM---OOOO---I---MMMM-----XXZZXX---MMMM';
//$pattern = '/[a-z]+/i'; // Fails in this case
$pattern = '/\b((\w)\2+¦\w)(?:\b)/';
preg_match_all(
$pattern,
$subject,
$matches,
PREG_SET_ORDER ¦ PREG_OFFSET_CAPTURE
);
print "$subject\n";
print str_repeat('1234567890', 7) . "\n";
print_r($matches) . "\n";
foreach ($matches as $match) {
$length = strlen($match[0][0]);
$start = $match[0][1] + 1; // adjust for zero-based indexing
$end = $match[0][1] + $length;
print "$start - $end:" . $match[0][0][0] . "\n";
}

The pattern reads:
Find a word boundary followed by either a repeated letter or a single letter followed by another word boundary. The ?: simply says not to capture the subpattern. I could have left it off in this particular case with no ill effects.

The other printing code in the middle was left there so you could analyze how the patterns are captured and how the offsets work. Details are on the manual pages in the link.

Note: The forum breaks the pipe symbol so you must rekey it if you copy/paste the code

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved