Welcome to WebmasterWorld Guest from 54.160.254.203

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

group similar data in a string?

     
10:20 pm on Apr 25, 2008 (gmt 0)

Junior Member

10+ Year Member

joined:Sept 26, 2005
posts:90
votes: 0


Hello all
I didn't know a good way of naming my question, so I am getting straight to the point:
I have a string like the following:

-----MMMMM------IIIII----MMM---OOOO---I---MMMM-----

Is there any way I can gather and group my information and get, for example:


6-10:M
17-21:I
26-28:M
32-35:O
39:I
43-46:M

Thank you in advance.
9:54 pm on Apr 26, 2008 (gmt 0)

Administrator

WebmasterWorld Administrator coopster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:July 31, 2003
posts:12533
votes: 0


The pattern you require will depend on what you are trying to locate in the string but I guess I would probably use preg_match_all [php.net] with some optional offset flags. It sounded like you want to get only repeated letter patterns so although a simple [a-z]+ pattern would work on your original string it would fail on a string where grouped characters are not all the same. Note my addition to clarify:
//$subject = '-----MMMMM------IIIII----MMM---OOOO---I---MMMM-----'; 
//$pattern = '/[a-z]+/i'; // Works fine on the original subject string
$subject = '-----MMMMM------IIIII----MMM---OOOO---I---MMMM-----XXZZXX---MMMM';
//$pattern = '/[a-z]+/i'; // Fails in this case
$pattern = '/\b((\w)\2+¦\w)(?:\b)/';
preg_match_all(
$pattern,
$subject,
$matches,
PREG_SET_ORDER ¦ PREG_OFFSET_CAPTURE
);
print "$subject\n";
print str_repeat('1234567890', 7) . "\n";
print_r($matches) . "\n";
foreach ($matches as $match) {
$length = strlen($match[0][0]);
$start = $match[0][1] + 1; // adjust for zero-based indexing
$end = $match[0][1] + $length;
print "$start - $end:" . $match[0][0][0] . "\n";
}

The pattern reads:
Find a word boundary followed by either a repeated letter or a single letter followed by another word boundary. The ?: simply says not to capture the subpattern. I could have left it off in this particular case with no ill effects.

The other printing code in the middle was left there so you could analyze how the patterns are captured and how the offsets work. Details are on the manual pages in the link.

Note: The forum breaks the pipe symbol so you must rekey it if you copy/paste the code

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members