homepage Welcome to WebmasterWorld Guest from 54.161.214.221
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
preg_replace excluding first character
how to parse only url that aren't tagged
DanSteph




msg:1283827
 7:39 pm on Mar 19, 2003 (gmt 0)

Hi all,

I translate url but don't want that my function translate tagged url such [img] and some other. I tried many things but wasn't able to find a solution.

The goal would be to exclude matched string that start with a "]" (as "]http")

here is my actual preg_replace:

$body = preg_replace("/!\]((http(s?):\/\/)(www\.))([a-z0-9;\/\?:@=\&\$\-_\.\+!*'\(\),~%]+)\b/i","<a href=\"http$3://$4$5\"target=\"_blank\">$2$4$5</a>",$body);

Thanks for any input

Dan

 

DanSteph




msg:1283828
 7:43 pm on Mar 19, 2003 (gmt 0)

Sorry, the previous code include one of my many try, here is the correct function:

$body = preg_replace("/((http(s?):\/\/)(www\.))([a-z0-9;\/\?:@=\&\$\-_\.\+!*'\(\),~%]+)\b/i","<a href=\"http$3://$4$5\"target=\"_blank\">$2$4$5</a>",$body);

Dan

nosanity




msg:1283829
 7:51 pm on Mar 19, 2003 (gmt 0)

I would test for the tags first (ie: [img]), then run my RE *IF* it wasn't a tag other than [url] or whatever tags you want linked.

noSanity

DanSteph




msg:1283830
 1:49 am on Mar 20, 2003 (gmt 0)

Thanks I suppose I would do that with preg_match?

isn't a simpler way while this is used to match the one
that have a [img] tag in front:

$body = preg_replace("/[img]((http(s?):\/\/)[...SNIP...]

Perhaps there is a negation form for [img]?

I just started php and didn't find good doc on regex
(the official one don't tell to much) so I'm not good at this.

Dan

andreasfriedrich




msg:1283831
 2:20 am on Mar 20, 2003 (gmt 0)

You could use a zero-width negative look-behind assertion at the beginning of your pattern: (?<![img]).


$body = '[img]http://www.aaroncarter.com/';
#
$body = preg_replace [php.net](
'{(?:(?<!\[img])(http(s?)://)(www\.)(?<!\[img]http://)(www\.))([a-z0-9;/?:@=&$_.+!*\'(),~%-]+)(?=$\s\.)}i',
"<a href='http$2://$3$4$5' target='_blank'>$1$2$3$4$5</a>",
$body);
#
echo [php.net] $body;

I cleaned up the pattern. There is no need to escape special characters within a character class. Not using the slash to delimit the expression will save you from having to escape it within your pattern. If the - is the last character in a character class it will be interpreted literally.

The PHP [php.net] manual explains about pattern syntax [php.net]. Perl regular expressions [perldoc.com] is more in-depth and better to understand.

Andreas


Note: Make sure to include a space preceding the "!" in mod_rewrite code, and replace "" with a solid vertical pipe.

[edited by: jatar_k at 3:27 am (utc) on Mar. 20, 2003]
[edit reason] sidescroll, sorry it is pretty small, didn't want to break it up [/edit]

nosanity




msg:1283832
 6:24 am on Mar 20, 2003 (gmt 0)

Wow... ok, finally a regular expression that is giving me a headache.

I want this headache to go away, and I am way too tired to figure this out...

Wanna explain the regex?

Thanks,

noSanity

andreasfriedrich




msg:1283833
 6:08 pm on Mar 20, 2003 (gmt 0)

Here is the pattern with some explanations. Unfortunately PHP [php.net] does not support Perl [perl.com]s x pattern modifier which allows for annotated REs.


'{ ---------------------- opening delimiter {
__(?: ------------------- group, but do not store
____(?<! ---------------- sero width look behind assertation
______\[img\] ----------- matches when the 5 characters after
____) ------------------- the following pattern are not [img]
____( ------------------- create backreference $1
______http(s?):// ------- match http:// or https://
____) ------------------- end of $1
____( ------------------- create backreference $2
______www\. ------------- match www.
____) ------------------- end of $2
___ -------------------- OR
____(?<! ---------------- sero width look behind assertation
______\[img\]http:// ---- matches when the characters before
____) ------------------- www. are not [img]http://
____( ------------------- create backreference $3
______www\. ------------- match www.
____) ------------------- end of backreference $3
__) --------------------- end of grouping
__( --------------------- create backreference $4
____[a-z0-9;/?:@=&$_.+!*\'(),~%-]+ --- one or more character
__) --------------------- end of backreference $4
__(?= ------------------- sero width positive look ahead assertation
____$ ------------------- next characters needs to be either end of string
___ -------------------- OR
____\s ------------------ whitespace
___ -------------------- OR
____\. ------------------ dot
__) --------------------- end of assertation
}i' --------------------- end of pattern, closing delimiter, match case insensitive

Note that leading _ and trailing - are not part of the RE.

Andreas

DanSteph




msg:1283834
 6:49 pm on Mar 20, 2003 (gmt 0)

Many thanks for help and for link, this will be very useful.
But the pattern you give doesn't select also string that don't have [img] tag at the start. ( neither "http:" nor "www")

Looking at your example I have come to this:

$body = preg_replace("/(?<!\[img\])((http(s?):\/\/)(www\.))([a-z0-9;\/\?:@=\&\$\-_\.\+!*'\(\),~%]+)\b/i","<a href=\"http$3://$4$5\"target=\"_blank\">$2$4$5</a>",$body);

The only problem is that this one doesn't exclude the "[img]http" but only the "[img]www" string

So I tried this:

$body = preg_replace("/(?<!\[img\])((http(s?):\/\/)(?<!\[img\])(((www\.))([a-z0-9;\/\?:@=\&\$\-_\.\+!*'\(\),~%]+)\b/i","<a href=\"http$3://$4$5\"target=\"_blank\">$2$4$5</a>",$body);

And many other things since several hours but still it don't exclude "[img]http"

I'm very familiar with C++ programation and some other language but I must confess that those pattern bug me. (But it's true that I didn't started with an easy one :)

Dan

[edited by: jatar_k at 6:55 pm (utc) on Mar. 20, 2003]
[edit reason] no sigs please [/edit]

nosanity




msg:1283835
 6:58 pm on Mar 20, 2003 (gmt 0)

Andreas,

Wow (flagged in seconds)... my headache just went away. In fact, a few portions of that are new to me, even though I thought I had RE down pretty good. Thanks alot!

I think I will actually print that breakdown.

noSanity

andreasfriedrich




msg:1283836
 7:26 pm on Mar 20, 2003 (gmt 0)

>>But the pattern you give doesn't select also string that
>>don't have [img] tag at the start.

I may have misunderstood you but I think you wanted a RE that does the following:

[aaroncarter.com...] -> turned into a link
www.aaroncarter.com/ -> turned into a link
[img]http://www.aaroncarter.com/ -> stays the same
[img]www.aaroncarter.com/ -> stays the same

The first and third thing were exactly what the code I posted does. It failed on the fourth line, i.e. it turned it into a link despite the [img] since we only required that there be no [img]http:// in front of the www. And this was true since it was only a [img]. Thus the sero width negative look behind assertation matched and we got the link.

Here is a script that will work for that case as well.


$body = '[img]http://www.aaroncarter.com/<br/>';
$body .= 'http://www.aaroncarter.com/<br/>';
$body .= '[img]www.aaroncarter.com/<br/>';
$body .= 'www.aaroncarter.com/<br/>';
#
$body = preg_replace(
'{(?:(?<!\[img\])(http(s?)://)(www\.)([a-z0-9;/?:@=&$_.+!*\'(),~%-]+)(?=$[.\s<>!:;,?])
__CONTINUED_FROM_PREVIOUS_LINE____CONTINUE_ON_NEXT_LINE__
(?<!http://)(?<!\[img\])(www\.[a-z0-9;/?:@=&$_.+!*\'(),~%-]+)(?=$[.\s<>!:;,?]))}ie',
"'$7'!= ''? \"<a href='http://$7' target='_blank'>$7</a>\"
: \"<a href='http$2://$3$4$5' target='_blank'>$1$2$3$4$5</a>\"",
$body);
#
echo $body;

$body will now contain:


[img]http://www.aaroncarter.com/<br/>
<a href='http://www.aaroncarter.com/'
target='_blank'>http://www.aaroncarter.com/</a><br/>
[img]www.aaroncarter.com/<br/>
<a href='http://www.aaroncarter.com/'
target='_blank'>www.aaroncarter.com/</a><br/>

If thats what you want then its all there (Did you replace the broken pipe with a vertical unbroken one?). If not I did not get what you are trying to do and you might want to explain it again.

Andreas


Note: Make sure to include a space preceding the "!" in mod_rewrite code, and replace "" with a solid vertical pipe.
DanSteph




msg:1283837
 3:54 pm on Mar 21, 2003 (gmt 0)

Andrea,

Many thanks for your help...
yes you understanded well it was what I wanted.

Howewer your function failed on some url as:
[www-star.stanford.edu...]
[pauldunn.dynip.com...]
[www-star.stanford.edu...]
[atmos.nmsu.edu...]

So I modified slightly your example and now it work perfectly. Below the modified function:

$body = preg_replace( [1]'{(?:(?<!\[img\])(http(s?)://)([a-z0-9;/?:@=&$-_.+!*\'(\),~%-]+)(?=$[.\s<>!:;,?])(?<!http://)(?<!\[img\])(www[a-z0-9;/?:@=&$-_.+!*\'(\),~%-]+)(?=$[.\s<>!:;,?]))}ie',[/1] "'$7'!= ''? \"<a href='http://$7' target='_blank'>$7</a>\" : \"<a href='http$2://$3$4$5' target='_blank'>$1$2$3$4$5</a>\"", $body);


Many thanks again without your help I would not be able to resolve this one.

Dan

[edited by: jatar_k at 4:44 pm (utc) on Mar. 21, 2003]
[edit reason] small but no sidescroll [/edit]

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved