homepage Welcome to WebmasterWorld Guest from 54.161.192.130
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
regex help
camilord

5+ Year Member



 
Msg#: 4595590 posted 3:52 am on Jul 23, 2013 (gmt 0)

hello.. can anybody help me?

using php regex, for example:

FROM

<br><b> <i>LoA - </b>only applies to housing, work camps<br>


TO

<br><div class="xxx"><b> <i>LoA - </b>only applies to housing, work camps</div><br>


anybody?

 

JD_Toims

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4595590 posted 4:56 am on Jul 23, 2013 (gmt 0)

How to do it depends on the structure of the entire string. It would help me more if there's a longer example that showed the <p>s and things along those lines, but if you want to change any <br><b> to have a div with a class in the middle, then from what you've posted I'd do it in 2 parts.

First, I'd:
str_replace('<br><b>','<br><div class="the_class"><b>',$the_string);

Second, I'd:
preg_replace('#(<br><div class="the_class"><b>.*?)(<br>)#m',"$1</div>$2",$the_string);

Or something along those lines.

* Note: My regexes look a bit odd to most people because I use a # delimiter, but one day when I was matching URLs and got tired of escaping every bleeping / I switched and it's really made things easier since there aren't #s used anywhere near as often as /s in the expressions I normally write.

[edited by: phranque at 5:10 am (utc) on Jul 23, 2013]
[edit reason] disabled graphic smileys [/edit]

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4595590 posted 5:23 am on Jul 23, 2013 (gmt 0)

Now, about the sequence of tags...

<b> <i>text </b> more-text-here


;)

camilord

5+ Year Member



 
Msg#: 4595590 posted 8:35 pm on Jul 23, 2013 (gmt 0)

It will already be broken down by \n so you don’t need to worry about that, what I need is for it to find every occurrence of “LoA” and then put that inside a div until it gets to a <br>

Basically /<b>LoA<\/b>.*<br>/ is an example, however he’s putting <b>, <u> and all sorts of other stuff in there and the format isn’t consistent.

Example:

<b> 5. G3.3.2 - </b> Do spaces used for food preparation and utensil washing have: <br> <b>a) </b>interior linings and work surfaces that are impervious and easily cleaned? <br> <b>(b) </b>all building elements constructed with materials which are free from hazardous substances which could cause contamination to the building contents? <br><b> LoA </b>- only applies to housing, work camps, old people's homes and early childhood centres and where appropriate Commercial and Industrial buildings whose intended use includes the manufacture, preparation, packaging or storage of food. <br> <b>(c) </b> exposed building elements located & shaped to avoid accumulation of dirt? <br><b> <i>LoA </b> only applies to housing</i>

Becomes:

<b> 5. G3.3.2 - </b> Do spaces used for food preparation and utensil washing have: <br> <b>a) </b>interior linings and work surfaces that are impervious and easily cleaned? <br> <b>(b) </b>all building elements constructed with materials which are free from hazardous substances which could cause contamination to the building contents? <br><div class=”xxxxxx”><b> LoA </b>- only applies to housing, work camps, old people's homes and early childhood centres and where appropriate Commercial and Industrial buildings whose intended use includes the manufacture, preparation, packaging or storage of food.</div><br> <b>(c) </b> exposed building elements located & shaped to avoid accumulation of dirt? <br><div class=”xxxxxx”><b> <i>LoA </b> only applies to housing</i></div>

camilord

5+ Year Member



 
Msg#: 4595590 posted 9:13 pm on Jul 23, 2013 (gmt 0)

i think i got it...


function fixLoA($content) {
$lines = explode("\n",str_replace('<br />',"\n", str_replace('<br>',"\n",$content)));
for ($i = 0; $i < count($lines); $i++) {
$lines[$i] = str_replace('? ',"? \n",$lines[$i]);
$lines[$i] = str_replace('. ',". \n",$lines[$i]);
$lines[$i] = preg_replace_callback(
//"#\<b\>(s+|.*LoA.*|s+)(\n|<br>)#s",
//"/^<b>(?=.*)(.*LoA.*)(?=.*)<br>$/",
'/[<b>|<b>|<b><i>].*LoA.*<\/b>.*(<br>|<br \/>|<\/i>|\n)/',
create_function(
'$matches',
'return "<div class=\"xxx\" style=\"color: RED\">".$matches[0]."</div>";'
),
$lines[$i]
);
}
return implode("\n",$lines);
}



anyone can improve my method? please help.. thanks..

JD_Toims

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4595590 posted 9:20 pm on Jul 23, 2013 (gmt 0)

So it sounds like we need to find <br> followed by anything or nothing (space, <u>, etc.) followed by <b>, followed by anything or nothing (space, <u>, etc.), followed by exactly LoA, followed by anything, followed by <br>...

preg_replace('#(<br>.*)(<b>.*LoA.*)(<br>)#U',"$1<div class=\"the_class\">$2</div>$3",$the_string);

* Note: I added the U modifier to make all .* patterns "ungreedy", meaning .*? will make them "greedy" again rather than the default behavior.

** Added Note: This would probably be a good place to use a "look-ahead" to find the LoA or "break and move on" for efficiency, but didn't write that in.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4595590 posted 10:42 pm on Jul 23, 2013 (gmt 0)

[<b>|<b>|<b><i>]

Is that a typo for

[<b>|<i>|<b><i>]
?

Seems like what you'd want is something based on

LoA([^<]*((</?\w>[^<]*)+)<br>

Will the source text ever contain anything beyond simple inline markup like <b>? Either a multi-letter tag like <em> or <wbr>, or something still more complex like <span blahblah or <a blahblah containing non-word characters.

It is extremely inconvenient that <b> and <br> start with the same letter, so you can't simply say </?[^b][^>]* and be done with it. It would have to be
</?(?:b|[^b][^>]*)>
where I had
</?\w>
above.

JD_Toims

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4595590 posted 11:28 pm on Jul 23, 2013 (gmt 0)

<?php

$the_string="<b> 5. G3.3.2 - </b> Do spaces used for food preparation and utensil washing have: <br> <b>a) </b>interior linings and work surfaces that are impervious and easily cleaned? <br> <b>(b) </b>all building elements constructed with materials which are free from hazardous substances which could cause contamination to the building contents? <br><b> LoA </b>- only applies to housing, work camps, old people's homes and early childhood centres and where appropriate Commercial and Industrial buildings whose intended use includes the manufacture, preparation, packaging or storage of food. <br> <b>(c) </b> exposed building elements located & shaped to avoid accumulation of dirt? <br><b> <i>LoA </b> only applies to housing</i><br>";

$the_string=preg_replace('#(<br>.*?)(<b>\s*(<(i|u)>)?\s*\bLoA\b.*?)(<br>)#',"$1<div class=\"the_class\">$2</div>$5",$the_string);

echo $the_string;

?>

* Obviously a "non-capturing" grouping of the <i> or <u> would be a bit more efficient.

[edited by: phranque at 11:34 pm (utc) on Jul 23, 2013]

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4595590 posted 1:16 am on Jul 24, 2013 (gmt 0)

(i|u)

Is this more efficient than
[iu]
?

JD_Toims

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4595590 posted 1:31 am on Jul 24, 2013 (gmt 0)

That's a good question and one I've had for years. Even though I've asked experts the answer is: I really don't know, because no one I've asked has had an answer either.

Fortunately, I'm guessing since RAM has become so cheap and processing power so much better over the last few years we're likely in "6 of one, half dozen of the other" territory as far as "speed impact" of either is concerned. I actually switch back and forth, even in the middle of a single expression sometimes, because I just use whichever pops into my head first as "will work here".

If I had to guess it would be yours by a "blip" because [ui] is less characters than (?:u|i) and (u|i) "stores the match for back-reference" so "not storing + less characters" should have a slight advantage.

JD_Toims

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4595590 posted 7:22 pm on Jul 28, 2013 (gmt 0)

Found the answer:
Certain items that may appear in patterns are more efficient than others. It is more efficient to use a character class like [aeiou] than a set of alternatives such as (a|e|i|o|u).

Haven't read that page in years, so I don't know when it got updated to include the preceding, but it's right there in #000 and #FFF.

[php.net...]

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved