Forum Moderators: coopster

Message Too Old, No Replies

Problem with a regular expression

         

TheDarkPhoenix

11:13 pm on May 7, 2005 (gmt 0)

10+ Year Member



I'm having a regular expression problem. I need it to match all occurences of "href="(not http)<whatever-else"". My attemps have been:

href=\"[^h][^t][^t][^p][^:][^/][^/][^\"]+\", which works, but will fail on things like "href="attribute.html"", as there are t's where the regex doesn't expect them. Thus, this attempt can be ditched.

href=\"(^http:\/\/)[\"]+\", which doesn't work, and ends up matching only things like "href="httpandthensomemore.html""

and I couldn't really figure out more. I need a regular expression that only matches the WORD http, not each individual character, and I need it to match the whole word, as this is not used as a check for links starting with http, but for replacing relative links with absolute ones in the following piece of PHP code:

preg_replace("/href=\"<regex that works... hopefully>\"/", "http://address.com/\$1", $pieceOfHTML);

And I chose to post this here and not in the PHP section, as it's not concerning the PHP but the regular expression.

(Another less important issue, but still, if it can be addressed it would be nice, is that the expressions I came up with would require the length of the relative URL to be at least 8 characters, which is really a pain, so if you have a fix for that as well, I'd appreciate it.)

buriedUnderGround

2:19 pm on May 9, 2005 (gmt 0)

10+ Year Member



you might want to take a look at this post:
[webmasterworld.com...]

buriedUnderGround

9:38 am on May 10, 2005 (gmt 0)

10+ Year Member



i took a shot at this and came up with the following:

preg_replace("/<a\shref=\"([^(http)+].*)\">", "<a href=\"http://site.com/\$1">", $pieceOfHTML);

The only thing is that it only works with relative urls without a leading slash.
ie: page.htm works but /page.htm doesn't. The resulting url looks like this : [site.com...]

TheDarkPhoenix

1:38 pm on May 14, 2005 (gmt 0)

10+ Year Member



I actually managed to come up with a regular expression that would do what I wanted. Since someone else might have use for it, I'll post it here:

href=\"(((?<!http:\/\/)[^\"])+)\"

coopster

8:38 pm on May 14, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



And there you have it. Welcome to WebmasterWorld, TheDarkPhoenix.