preg match all pattern help.

Forum Moderators: coopster

Message Too Old, No Replies

preg match all pattern help.

how to figure it out..

achshar

9:06 am on May 21, 2010 (gmt 0)

hello,
i am making a replacment system.. i dont really know what it is called...
its like when i enter into the database [bold]foo[/bold]
the scripts changes [bold] to i can change single words like strong, em, u, p, div etc but i want it to manipulate variables like

[url=LINK title=TITLE]link to foo[/url]

OR something like

[div class=CLASSNAME]div contenet[/ div]

i know preg_replace_all is what i need.. i can also do php. but i am having a hard tome figuring what the pattern is? like any tutorial or anything would be appreciated..
like this is i found..


/\[url=(.*?)\](.*?)\[\/url\]/i

what i coud get is..
(.*?) is any variable
and \ is used to escape plain words, just like usual php
but what is the i at end for?
or those extra /s..?

i have done php all by my self, so even if it had been something basic, i wouldn't be knowing it :(

Thanks
achshar

Alcoholico

3:02 pm on May 21, 2010 (gmt 0)

Something like this should work, at least help you understand how it works:

function parseBBCode($txt) { 
 $patterns = array(
'#\[b\](.+)\[\/b\]#iUs',
'#\[url\=(.+)\](.+)\[\/url\]#iUs',
'#\[img\](.+)\[\/img\]#iUs',
'#\[quote\=(.+)\](.+)\[\/quote]#iUs'
);
 $replacements = array(
'<b>$1</b>', 
'<a href="$1">$2</a>',
'<img src="$1" />',
'<div class="quote_owner">$1 Said:</div><blockquote>$2</blockquote>'
);
 return preg_replace($patterns, $replacements, $txt); 
}

However, you're better off taking a look at PCRE documentation on php.net. You could also search the net for "php bbcode parser", that's what you're buiding.

achshar

7:53 am on May 22, 2010 (gmt 0)

hello Alcoholico.. thanks alot for reply... yes i'v now been able to figur out most of it by my self

but yeah php bbcode parser.. that is! thats what i needed.. i simply didnt knew what to search for.. :) thanks.. and that code
will try that too... thanks alot again.. :) :D

achshar

7:38 pm on Jun 6, 2010 (gmt 0)

hey hello..
i now have one question.. in the code you gave me above the pattern for [b ][/b ] what is 'iUs' in the end for? it is there for every enty.. it works even if i remove it..

also this is what i've come up with.. this works.. just want to confirm if this is a legitimate way of doing it :)



$string = html_entity_decode($string, ENT_QUOTES);
$markup = array(
'#\[b\](.+)\[\/b\]#iUs' => '<b>$1</b>'
);

$from = array();
$to = array();

foreach ($markup as $a => $b){
$from[] = $a;
$to[] = $b;
}
echo preg_replace($from, $to, $string);

Matthew1980

8:18 pm on Jun 6, 2010 (gmt 0)

Hi there achshar,

Those chars (i, U and s)are pattern delimiters (I hope ;)) they set conditions of use for the preceding pattern. In this case i = case insensitve, and U and s I am not sure about, however, if you check this link out it can explain it better than I ever could:-

[php.net ]

There are probably better explanations out there, but I'm sure a quick google will do the trick. But from what I can see 's' enables the engine to read everthing including the newline chars, without it, \n & \r would be ignored... Have a read, and hopefully all will become clear.

Hope that helps you,

Cheers,
MRb

rocknbil

12:13 am on Jun 7, 2010 (gmt 0)

# = delimiters, which define "the pattern." Those thar' be modifiers, as described. :-)

it works even if i remove it..

try it with [B ] or [b
]

or multiple instances within your test block of text, and it won't. :-)

As mentioned, i is a case insensitive modifer, which is why [B ] would fail without the modifiers.

U is a modifier to stop the "greediness" of a match. One of the problems with regexps is sometimes you have a pattern, like

bold more text more text more bold

So if you, say, want to match on the strong tags and take out everything in between, some regular expressions might go all the way to the SECOND ending and nuke the entire line when you just want to take out the two instances. This is to stop this "greediness." This is why they say, in the documentation,

(e.g. .*?)

The question mark, in other languages, is what is called a quantifier to limit greedy pattern matching. The dot . means any character, * means zero or more, so without the quantifer,

preg_replace('/<[^>]+>.*<\/[^>]+>/','',$string)

might indeed nuke the entire string, because there ARE zero or more characters between, and those include the first closing and the second opening . The quantifer or the U modifier will limit this and will only nuke the tags and what's between them.

s - as mentioned the dot character means "any character." When dots are used in the regexp and you include the s modifier, it will include newlines in the match and it otherwise wouldn't, which is why

[b
]

would fail without it.

achshar

2:54 pm on Jun 7, 2010 (gmt 0)

ohkzz i believe i got most of it now:)
and yeah i have completed the bbcode parser.. and it works perfectly. Apparently i had been able to figure out that multiple instances or opposite case code does not work without those pattern delimiters :)

thanks alot :D

Matthew1980

3:06 pm on Jun 7, 2010 (gmt 0)

rocknbil: Gah, I always get those backwards (delimiters/modifiers), thanks for the pointer ;)

Achshar: Glad you are all sorted out now.

Cheers,
MRb