Welcome to WebmasterWorld Guest from 54.163.100.58

Forum Moderators: coopster & jatar k

regex help

   
3:52 am on Jul 23, 2013 (gmt 0)

5+ Year Member



hello.. can anybody help me?

using php regex, for example:

FROM

<br><b> <i>LoA - </b>only applies to housing, work camps<br>



TO

<br><div class="xxx"><b> <i>LoA - </b>only applies to housing, work camps</div><br>



anybody?
4:56 am on Jul 23, 2013 (gmt 0)

WebmasterWorld Senior Member Top Contributors Of The Month



How to do it depends on the structure of the entire string. It would help me more if there's a longer example that showed the <p>s and things along those lines, but if you want to change any <br><b> to have a div with a class in the middle, then from what you've posted I'd do it in 2 parts.

First, I'd:
str_replace('<br><b>','<br><div class="the_class"><b>',$the_string);

Second, I'd:
preg_replace('#(<br><div class="the_class"><b>.*?)(<br>)#m',"$1</div>$2",$the_string);

Or something along those lines.

* Note: My regexes look a bit odd to most people because I use a # delimiter, but one day when I was matching URLs and got tired of escaping every bleeping / I switched and it's really made things easier since there aren't #s used anywhere near as often as /s in the expressions I normally write.

[edited by: phranque at 5:10 am (utc) on Jul 23, 2013]
[edit reason] disabled graphic smileys [/edit]

5:23 am on Jul 23, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Now, about the sequence of tags...

<b> <i>text </b> more-text-here


;)
8:35 pm on Jul 23, 2013 (gmt 0)

5+ Year Member



It will already be broken down by \n so you don’t need to worry about that, what I need is for it to find every occurrence of “LoA” and then put that inside a div until it gets to a <br>

Basically /<b>LoA<\/b>.*<br>/ is an example, however he’s putting <b>, <u> and all sorts of other stuff in there and the format isn’t consistent.

Example:

<b> 5. G3.3.2 - </b> Do spaces used for food preparation and utensil washing have: <br> <b>a) </b>interior linings and work surfaces that are impervious and easily cleaned? <br> <b>(b) </b>all building elements constructed with materials which are free from hazardous substances which could cause contamination to the building contents? <br><b> LoA </b>- only applies to housing, work camps, old people's homes and early childhood centres and where appropriate Commercial and Industrial buildings whose intended use includes the manufacture, preparation, packaging or storage of food. <br> <b>(c) </b> exposed building elements located & shaped to avoid accumulation of dirt? <br><b> <i>LoA </b> only applies to housing</i>

Becomes:

<b> 5. G3.3.2 - </b> Do spaces used for food preparation and utensil washing have: <br> <b>a) </b>interior linings and work surfaces that are impervious and easily cleaned? <br> <b>(b) </b>all building elements constructed with materials which are free from hazardous substances which could cause contamination to the building contents? <br><div class=”xxxxxx”><b> LoA </b>- only applies to housing, work camps, old people's homes and early childhood centres and where appropriate Commercial and Industrial buildings whose intended use includes the manufacture, preparation, packaging or storage of food.</div><br> <b>(c) </b> exposed building elements located & shaped to avoid accumulation of dirt? <br><div class=”xxxxxx”><b> <i>LoA </b> only applies to housing</i></div>
9:13 pm on Jul 23, 2013 (gmt 0)

5+ Year Member



i think i got it...


function fixLoA($content) {
$lines = explode("\n",str_replace('<br />',"\n", str_replace('<br>',"\n",$content)));
for ($i = 0; $i < count($lines); $i++) {
$lines[$i] = str_replace('? ',"? \n",$lines[$i]);
$lines[$i] = str_replace('. ',". \n",$lines[$i]);
$lines[$i] = preg_replace_callback(
//"#\<b\>(s+|.*LoA.*|s+)(\n|<br>)#s",
//"/^<b>(?=.*)(.*LoA.*)(?=.*)<br>$/",
'/[<b>|<b>|<b><i>].*LoA.*<\/b>.*(<br>|<br \/>|<\/i>|\n)/',
create_function(
'$matches',
'return "<div class=\"xxx\" style=\"color: RED\">".$matches[0]."</div>";'
),
$lines[$i]
);
}
return implode("\n",$lines);
}



anyone can improve my method? please help.. thanks..
9:20 pm on Jul 23, 2013 (gmt 0)

WebmasterWorld Senior Member Top Contributors Of The Month



So it sounds like we need to find <br> followed by anything or nothing (space, <u>, etc.) followed by <b>, followed by anything or nothing (space, <u>, etc.), followed by exactly LoA, followed by anything, followed by <br>...

preg_replace('#(<br>.*)(<b>.*LoA.*)(<br>)#U',"$1<div class=\"the_class\">$2</div>$3",$the_string);

* Note: I added the U modifier to make all .* patterns "ungreedy", meaning .*? will make them "greedy" again rather than the default behavior.

** Added Note: This would probably be a good place to use a "look-ahead" to find the LoA or "break and move on" for efficiency, but didn't write that in.
10:42 pm on Jul 23, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



[<b>|<b>|<b><i>]

Is that a typo for

[<b>|<i>|<b><i>]
?

Seems like what you'd want is something based on

LoA([^<]*((</?\w>[^<]*)+)<br>

Will the source text ever contain anything beyond simple inline markup like <b>? Either a multi-letter tag like <em> or <wbr>, or something still more complex like <span blahblah or <a blahblah containing non-word characters.

It is extremely inconvenient that <b> and <br> start with the same letter, so you can't simply say </?[^b][^>]* and be done with it. It would have to be
</?(?:b|[^b][^>]*)>
where I had
</?\w>
above.
11:28 pm on Jul 23, 2013 (gmt 0)

WebmasterWorld Senior Member Top Contributors Of The Month



<?php

$the_string="<b> 5. G3.3.2 - </b> Do spaces used for food preparation and utensil washing have: <br> <b>a) </b>interior linings and work surfaces that are impervious and easily cleaned? <br> <b>(b) </b>all building elements constructed with materials which are free from hazardous substances which could cause contamination to the building contents? <br><b> LoA </b>- only applies to housing, work camps, old people's homes and early childhood centres and where appropriate Commercial and Industrial buildings whose intended use includes the manufacture, preparation, packaging or storage of food. <br> <b>(c) </b> exposed building elements located & shaped to avoid accumulation of dirt? <br><b> <i>LoA </b> only applies to housing</i><br>";

$the_string=preg_replace('#(<br>.*?)(<b>\s*(<(i|u)>)?\s*\bLoA\b.*?)(<br>)#',"$1<div class=\"the_class\">$2</div>$5",$the_string);

echo $the_string;

?>

* Obviously a "non-capturing" grouping of the <i> or <u> would be a bit more efficient.

[edited by: phranque at 11:34 pm (utc) on Jul 23, 2013]
1:16 am on Jul 24, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



(i|u)

Is this more efficient than
[iu]
?
1:31 am on Jul 24, 2013 (gmt 0)

WebmasterWorld Senior Member Top Contributors Of The Month



That's a good question and one I've had for years. Even though I've asked experts the answer is: I really don't know, because no one I've asked has had an answer either.

Fortunately, I'm guessing since RAM has become so cheap and processing power so much better over the last few years we're likely in "6 of one, half dozen of the other" territory as far as "speed impact" of either is concerned. I actually switch back and forth, even in the middle of a single expression sometimes, because I just use whichever pops into my head first as "will work here".

If I had to guess it would be yours by a "blip" because [ui] is less characters than (?:u|i) and (u|i) "stores the match for back-reference" so "not storing + less characters" should have a slight advantage.
7:22 pm on Jul 28, 2013 (gmt 0)

WebmasterWorld Senior Member Top Contributors Of The Month



Found the answer:
Certain items that may appear in patterns are more efficient than others. It is more efficient to use a character class like [aeiou] than a set of alternatives such as (a|e|i|o|u).

Haven't read that page in years, so I don't know when it got updated to include the preceding, but it's right there in #000 and #FFF.

[php.net...]
 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month