Welcome to WebmasterWorld Guest from 3.234.210.89

Forum Moderators: coopster & jatar k

Splitting HTTP HOST

     
9:01 pm on Aug 18, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Mar 15, 2013
posts: 1205
votes: 120


Let's say that I have a series of strings like these:

$urlA = 'www.example.com';
$urlB = 'ww2.example.com';
$urlC = 'example.com';
$urlD = 'example.co'; // .CO, not .COM
$urlE = 'example.net';

I would like to split them up in to $subdomain, $domain, and $extension (I know that a "domain" technically includes the extension, but I don't know what "example" would be called in this case).

In Perl, I do it like this:

($subdomain, $domain, $extension) = $urlA =~ /(ww[w2])?\.?(.*?)\.(com|net|co)/i;


But PHP doesn't seem to be quite so simple :-(

My first step was to simply explode() on the ., like:

list($subdomain, $domain, $extension) = explode('.', $urlA);


That works just fine as long as a subdomain exists, but if the user doesn't include the "www" then it sets everything to the wrong variable.

I actually force the "www" via .htaccess so this isn't a huge problem, but there's no guarantee that it will always be this way (especially since browsers are hiding the www now). So I wonder if there's a better way?

My next step was to use preg_split(), like so:

list($url, $subdomain, $domain, $extension) =
preg_split('/(.*)?\.?(.*?)\.(com?|net)/i',
$urlA,
-1,
PREG_SPLIT_DELIM_CAPTURE);


This seems to do what I'm wanting, but preg_split is about 4 times slower than explode :-( Which isn't a huge deal, I guess, we're talking about 0.000036s vs 0.000008s... but microseconds add up, I guess.

Before I implement preg_split(), can you guys suggest a better way to get the information I'm needing?
9:06 pm on Aug 18, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Nov 13, 2016
posts:1194
votes: 288


the first thing which comes to my mind :


$tmp=explode('.',$url);
$extension=array_pop($tmp);
$domain=array_pop($tmp);
$subdomain=$tmp[0]??'';


If you want to set the subdomain to "www", if no subdomain was used, then it becomes:


$subdomain=$tmp[0]??'www';


an alternative


$subdomain=implode('.',$tmp);


in the second case, it works if you have "several" sub domains "a.b.c.d.domain.ext"
9:33 pm on Aug 18, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Nov 13, 2016
posts:1194
votes: 288


especially since browsers are hiding the www now

The fact that browsers are "simplifying" the "display" of the address of page has no impact on the request itself, which still carries the protocol, sub domain, parameters, etc... (of course).
9:41 pm on Aug 18, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Nov 13, 2016
posts:1194
votes: 288


edit: the null coalescing operator "??", works only with PHP 7+ ; the PHP 5 branch is now too old to be used "seriously" and "safety".

but if you still run PHP5.x then it should be:


$subdomain=isset($tmp[0])?$tmp[0]:'';


or


$subdomain=isset($tmp[0])?$tmp[0]:'www';
9:57 pm on Aug 18, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Nov 13, 2016
posts:1194
votes: 288


Since I am bored tonight, ... if you are seeking for faster* code, this can be:


$tmp=explode('.',$url);
$n=count($tmp);
$extension=$tmp[--$n]??'';
$domain=$tmp[--$n]??'';
$subdomain=$tmp[--$n]??'';


* this is micro optimization, the difference of speed might not even be noticeable, excepting if you are doing billions of operations of this kind. However, this code avoids 2 calls to functions ("array_pop"), ("count" looks like a function, but it's optimized at the level of the PHP compiler). Also, in this code we are not modifying the $tmp array, which is saving some cycles too. But as I said, it's not going to change anything in final.
1:37 am on Aug 19, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Mar 15, 2013
posts: 1205
votes: 120


I like that, Dmitri :-) I'm using PHP 5.x because I'm worried that updating to 7.x will make something stop working unexpectedly like the last major update did, so I'm gun shy...

I had to make a minor modification to get it to work, but running from your example I think this is probably going to be the fastest option:

$tmp = explode('.', $url);
$n = count($tmp) - 1;

$extension = $tmp[$n--]) ?: '';
$domain = $tmp[$n--] ?: '';
$subdomain = $tmp[$n--] ?: '';
8:13 am on Aug 19, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Nov 13, 2016
posts:1194
votes: 288


Since you are not using PHP 7+, the code should be :


$tmp = explode('.', $url);
$n = count($tmp) - 1;

$extension = $tmp[$n--]);
$domain = $tmp[$n--];
$subdomain = ($n>0)?$tmp[$n]:'';


Because the two question marks (null coalescing operator) is not available in PHP5

I'm worried that updating to 7.x will make something stop working unexpectedly

Yes, it will.

If you install the latest stable PHP 7 on your development machine(s), you should be able to spot what needs to be updated in your code. If, on your dev machine, you configured the error log report to display ALL error messages, there should be notice about function which were about to be depreciated.

You have a migration guide here :

Migrating from PHP 5.6.x to PHP 7.0.x
[php.net...]

and you have all the instructions to migrate from 7.0 => 7.1 => 7.2 => 7.3
(7.4 is the upcoming new version, but next year, we might have 8.0)

The most important to know, in my opinion, was the depreciation of the "mysql_" functions. Instead you have to use "mysqli_" (with the "i") or better the PDO class.

The problem with the 5.x branch, is that it's not longer maintained since December 2018. So no more security updates.
4:48 am on Aug 20, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Mar 15, 2013
posts: 1205
votes: 120


I honestly can't understand the logic here, Dimitri, so maybe you can explain this.

function getHost($url) {
// if I send www.example.com, the array looks like:
// $tmp[0] = www
// $tmp[1] = example
// $tmp[2] = com
$tmp = explode('.', $url);

// and count($tmp) is 3
$n = count($tmp) - 1;

// so this prints "2", as it should
print "Count: $n\n";

// here's where it gets weird, though. Shouldn't $n-- be looking for $tmp[1] at this point?
$extension = $tmp[$n--] ?: '';

// this prints "1", as it should
print "Count: $n\n";

// And if $n = 1 then $n-- should now be 0
$domain = $tmp[$n--] ?: '';

// this prints "0"
print "Count: $n\n";

$subdomain = ($n == 0) ? $tmp[$n] : '';

print "Subdomain: $subdomain\n";
print "Domain: $domain\n";
print "Extension: $extension";

print "\n\n";
}

getHost('www.example.com');
getHost('example.co');
getHost('example.com');


So am I to understand that when I do $extension = $tmp[$n--] ?: '';, it looks for $tmp[$n] and THEN negates $n by 1?
4:55 am on Aug 20, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Mar 15, 2013
posts: 1205
votes: 120


Note, at this point it's just for my own education. I think a simpler and faster solution might have been staring us in the face the whole time:

$tmp = explode('.', $url);
list($extension, $domain, $protocol) = array_reverse($tmp);
8:51 am on Aug 20, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Nov 13, 2016
posts: 1194
votes: 288


Shouldn't $n-- be looking for $tmp[1] at this point?

No.

So am I to understand that when I do $extension = $tmp[$n--] ?: '';, it looks for $tmp[$n] and THEN negates $n by 1?

Yes, exactly.

$something=$tmp[$n--];

is equivalent to
$something=$tmp[$n];
$n--;


When you put the -- operator "after" the variable, it first take the "actual" value of the variable. And only "after" decrease it.

if you put the -- operator "before" the variable, then the variable is first decreased, "before" being evaluated. This why in my code, I put the two minute operators before the variable, this was "saving" the "count()-1", operation.

Note, at this point it's just for my own education. I think a simpler and faster solution might have been staring us in the face the whole time:

Good thought! It's not faster from a CPU/mem point of view, but it's a lot simpler, and more convenient. Well done! :)
11:27 am on Aug 20, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Nov 13, 2016
posts: 1194
votes: 288


Can't edit.

And only "after", decreases it
two minus

There are certainly other spelling mistakes, but when I see some, I try to fix them.
7:13 pm on Aug 20, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Mar 15, 2013
posts: 1205
votes: 120


When you put the -- operator "after" the variable, it first take the "actual" value of the variable. And only "after" decrease it.
if you put the -- operator "before" the variable, then the variable is first decreased, "before" being evaluated.

I did not know that! I thought it was just a styling preference...

I learned something new today, so I'm going back to bed to end the day on a high note :-D