Forum Moderators: coopster

Message Too Old, No Replies

preg match all(), repeating matches with results in an assoc array

         

csdude55

3:21 pm on Jan 23, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This is a 2-part question...

I have a string like this:

<data type="forecast">
(blah blah blah...)
<time-layout time-coordinate="local" summarization="12hourly">
<layout-key>k-p12h-n14-1</layout-key>
<start-valid-time period-name="Thursday">2020-01-23T06:00:00-05:00</start-valid-time>
<start-valid-time period-name="Thursday Night">2020-01-23T18:00:00-05:00</start-valid-time>


The actual list of <start-valid-time> is the focus, and it can have anywhere from 2 to 20 results.

As an end result, I want an array that looks like:

$arr['2020-01-23T06:00:00-05:00'] = 'Thursday';
$arr['2020-01-23T18:00:00-05:00'] = 'Thursday Night';


************
So the first part of the question is about capturing the repeated matches. I tried this, but preg_match_all() only captures the last match:

// Was expecting an array like:
// $matches = array('Thursday', '2020-01-23T06:00:00-05:00', 'Thursday Night', '2020-01-23T18:00:00-05:00');

preg_match_all('#
<data.type="forecast">
.*?
<layout-key>k-p12h-n14-1</layout-key>
.*?
((<start-valid-time.period-name="([^"]+)">([^"]+)</start-valid-time>)+.*?)
#msix',
$contents,
$matches);


I did a ton of research and found several suggested hacks using \G or negative look behind, but nothing gave the results that I'm looking for. Can you guys and gals suggest a way to capture all of the matches in a single regex?

************
The second part of the question...

Once I figure out how to get that numeric array, is there an easier way to map it to an associative array than to loop through and manually build it?

I hacked this together and it works, but I'm concerned that there are a lot of moving parts, it's hard to read, and it will be a much slower process than if I could figure out a better preg_match_all() and mapping option:

$arr = array();

// remove everything except for the <start-valid-time...> elements
$thisContents = preg_replace('#
.*?
<data.type="forecast">
.*?
<layout-key>k-p12h-n14-1</layout-key>
.*?
(
(<start-valid-time.period-name="([^"]+)">([^"]+)</start-valid-time>)+
.*?
)
</time-layout>.*
#msix',
'$1',
$contents);

// remove the opening <start-valid-time...> tag
$thisContents = preg_replace('#\s*<start-valid-time period-name="#i',
'',
$thisContents);

// explode on the ending tag to create
// array('Thursday">2020-01-23T06:00:00-05:00', 'Thursday Night">2020-01-23T18:00:00-05:00')
$matches = explode('</start-valid-time>', $thisContents);

// loop through $matches, explode on ">, then create $arr value
foreach ($matches as $key) {
list($thisKey, $thisVal) = explode('">', $key);
$arr[$thisVal] = $thisKey;
}

print_r($arr);

w3dk

2:02 am on Jan 24, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



Extracting a specific "nugget" of data from an "XML" string using regex is one thing, but what you are trying to do here is essentially parse (a section of) the XML using regex which is not recommended (complex and hugely prone to error).

If this is valid XML (as suggested in your other thread [webmasterworld.com]) then it's recommended to use an XML parser (eg. SimpleXML).

As an academic exercise...

From the subset of data you posted I think you would need to focus solely on the repeating data, ie. "start-valid-time" elements - ignoring the parent elements (if these elements are unique in the file). Or do it in two steps...

1. Using regex... extract the data block you want to "parse" (inside containers "data", "layout-key", etc), ie. all the "start-valid-time" elements.
2. Using regex... preg_match_all() on the repeating "start-valid-time" elements only, extracted in the first step.

preg_match_all('#
<data.type="forecast">
.*?
<layout-key>k-p12h-n14-1</layout-key>
.*?
((<start-valid-time.period-name="([^"]+)">([^"]+)</start-valid-time>)+.*?)
#msix',
$contents,
$matches);


The immediate problem I see with this (by doing it in one step) is that "data" and "layout-key" only occur once is the "sample" data, so you are only going to get one match at most.

JayDub

2:06 am on Mar 25, 2020 (gmt 0)

5+ Year Member Top Contributors Of The Month



Can't we just jump to the explode?
$matches=explode('<start-valid-time period-name="',$thisContents);

# for() or some other loop of your choosing on $matches
# IIRC for() is an order of magnitude faster than foreach()

$loops=count($matches);
for($count=1; $count<=$loops; $count++) {
preg_match('([^"]+)">([^<]+)<',$matches[$count],$buildArray);
$finalContent[$buildArray[2]]=$buildArray[1];
}
print_r($finalContent);

It's been a while since I've coded and it's untested, but it might give you another idea on how to do it.

NickMNS

2:29 am on Mar 25, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My question pertains to only your first part.
Why?
You have the time stamp, why do you need "Thursday" or "Thursday Night". It is implicit in the time stamp. Granted, you may want to show your user that a specific time stamp is for a Thursday but then calculate the day when you need it.

lucy24

2:37 am on Mar 25, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



<tangent>
array('Thursday', '2020-01-23T06:00:00-05:00', 'Thursday Night', '2020-01-23T18:00:00-05:00')
Why isn't this kind of thing done as a two-dimensional array?
array(
array('Thursday','2020-01-23T06:00:00-05:00'),
array('Thursday Night','2020-01-23T18:00:00-05:00')
)
and so on. It isn't a list of 28 things, but of 14 sets of two things.
</tangent>

JayDub

8:59 pm on Mar 25, 2020 (gmt 0)

5+ Year Member Top Contributors Of The Month



Why isn't this kind of thing done as a two-dimensional array?

Sometimes one decides to color outside the lines either out of necessity or just because they can ;)

csdude55

7:47 pm on Mar 26, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For some reason this isn't showing up under "My Threads", so sorry for the late reply!

I stopped working on this in January and went on to something else, with the plan of coming back to it later. It's extremely complicated, though, because the XML feed has no documentation, and I end up with dozens of arrays that are pointing at one another. But one array might have 6 elements while another has 10, and somehow they're supposed to match up?

Let me get to a stopping point on my current project, and I'll come back to this with an example of the arrays I have to work with.