Forum Moderators: coopster

Message Too Old, No Replies

Http(s)

preg_replace either HTTP or HTTPS

         

smallcompany

3:40 am on Mar 14, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi,

I have these two lines of code:

$url_w = preg_replace("|http://([^\/]+)/.*|", "\\1", $_SERVER['HTTP_REFERER']);
$nonurl_w = preg_replace("|http://([^\/]+)/(.*)|", "\\2", $_SERVER['HTTP_REFERER']);

Now I want to have HTTPS considered as well, when needed. I came across something like this:

http(s) or http[s] or https?, etc.

but I'm not sure what would be the best approach that would cover both HTTP and HTTPS.

Thanks

lucy24

4:55 am on Mar 14, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In a Regular Expression, the form is
https?

where ? means "the immediately preceding character (or bracketed group, or longer string in parentheses) is optional".

https? = h,t,t,p ... and I'll take an "s" if you've got one
http(s) = h,t,t,p ... and then capture the "s"
http[s] = h,t,t,p ... and any one member of a group whose only member happens to be "s"

Do you need to escape a / slash in this situation? Normally it's only necessary when the / has some further meaning, such as enclosing the whole construction.

What php version are you on? \\1 will still work, but $1 is preferred and is more readable.

:: noting with interest that php has figured out how to deal with the Double-Digit Ambiguity ::

smallcompany

7:36 pm on Mar 14, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks very much!

Do you need to escape a / slash in this situation?

Not sure as somebody else has made the script for me long time ago.
The script is grabbing the search query, figuring if there's country related domain in Google and some other major search engines, and is also passing some of my own tracking variables I use in paid search.

What php version are you on?

I use this on a few servers, and PHP is either 5.4.32 or 5.3.29 at the moment.

\\1 will still work, but $1 is preferred and is more readable.

I wondered about that, but wasn't sure if it was to do what $1 and $2 are for. I'll change that as well. I didn't even know that it would work without $ sign. I'm a bit familiar with that through using it in .htaccess (regex).


Thank you

smallcompany

11:52 pm on Mar 14, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, after adding the $ sign, the same got transferred "as is" instead of the variable.

So, instead of getting a code (from an array) that marks a specific search engine, now I got $1 showing up.

I'm putting it back as it was.
Also, I wonder about programmatic side of that "\\1" since "\\$1" "broke" the script.

Thanks

lucy24

3:31 am on Mar 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oh, yikes, not \\$1 ! Just $1 without the \\ The two are alternative ways of saying the same thing. I'm inclined to think that escaping a $ would make it into a literal dollar sign. But don't quote me, because this varies according to RegEx engine. In any case the sequence \\$ would seem to be dangerously ambiguous.

Confession: I'd never seen the \\ locution in my life and had to consult php docs [php.net]. I don't actually speak php

:: looking irritably around for penders or someone like him ::

but I'm pretty solid on Regular Expressions. There's no doubt about the https? part which is what you originally asked.

Come to think of it, if it had been necessary to escape the / slashes, then the original pattern would have read "http:\/\/" etcetera. In general, Regular Expressions will simply ignore backslashes if they don't create a new meaning with the following character. The dangerous exception is that some RegEx engines will treat them as literal backslash \ characters. So don't go tossing backslashes around at random.

:: wandering off to pore over [regular-expressions.info...] ::

smallcompany

5:01 am on Mar 16, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oh, yikes, not \\$1 ! Just $1 without the \\

Yikes, too!

Fixed it, and thank you again.

Readie

1:47 pm on Apr 10, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In any case the sequence \\$ would seem to be dangerously ambiguous.

Confession: I'd never seen the \\ locution in my life and had to consult php docs [php.net]. I don't actually speak php

:: looking irritably around for penders or someone like him ::

I'm Penders or someone like him or something!

The second parameter for preg_replace is not regex, just a php string. "\\$1" is an escaped backslash (So a literal single backslash), $, 1. However, this string is then post-processed by preg_replace, and it treats \$ as a literal $, causing it not to be replaced.

Interestingly enough, if you had said preg_replace("REGEX", "\$1", $subject) then it would have worked, as the $ would have been explicitly literal in the string being passed into preg_replace, but not for preg_replace's post processing.