How to extract domain name only

Forum Moderators: coopster

Message Too Old, No Replies

How to extract domain name only

ashish2005

5:05 am on Apr 18, 2010 (gmt 0)

Hi,

Suppose these are the urls
http:// www . site . com
http:// www . site . co . uk
http:// subdomain . site . com
http:// site . com/page
(added spaces to not make them clickable)

So out of urls like that and more, I want to extract only the "site" part and I do not want the .com, www, subdomain, http: or anything else.

How do I achieve this in php. I have tried parse_url but it extracts more than what I need.

Readie

1:39 pm on Apr 18, 2010 (gmt 0)

A regular expression should do the trick. Problem is, there are so many possible top level domains, and some are a single dot (.com) whilesome are two (.co.uk) that it'd be quite difficult to write a regular expression to do the job.

ashish2005

6:09 pm on Apr 18, 2010 (gmt 0)

If it is too complicated then it is ok. I can do it by another method.

In another method. There are these urls

site . co. uk
site . com and so on

I need to search for the existence of "site" in the above url strings.

Can it be easily achievable?

Readie

6:50 pm on Apr 18, 2010 (gmt 0)

If it's only going to be .com or .co.uk and nothing else you can quite easily do this:

if(preg_match_all('/(?:(http:\/\/|www\.)|http:\/\/www\.)(?:[^\.]+)?(?:\.[^\.]+)*?(\.?[^\.]+)\.com?(?:\.uk)?.*/is', $someVariable, $out)) {
echo 'Domains found:';
foreach($out[1] as $out) {
echo '<br />' . $out;
}
}

[edited by: Readie at 7:25 pm (utc) on Apr 18, 2010]

Readie

6:56 pm on Apr 18, 2010 (gmt 0)

Very annoying. in those regular expressions I'm trying to write 2 question marks (preceeding item optional/excluded if possible) and these forums have an annoying filter stopping me.

(?:\.[^\.]+)*?

Add an extra question mark after that.

(?:[^\.]+)?

That too.

Tommybs

8:58 pm on Apr 18, 2010 (gmt 0)

In another method. There are these urls

site . co. uk
site . com and so on

I need to search for the existence of "site" in the above url strings.

Can it be easily achievable?

Hi,

For this are you saying you want to find out exactly if "site" is in the url? If so you can use this:


return stristr($url, "site");

with regards to the inital problem, I've knocked this up, hopefully it will be of some use?


 <?php
      $urls = array("http://www.site.com/page", "http://site.com", "http://sub.example.com", "http://sub.site.co.uk","http://site.co.uk");
      $exts =array("www.",".co.uk",".com",".net", ".org");
      foreach($urls as $k=>$v){
        $v = parse_url($v,PHP_URL_HOST);
        $v = str_replace($exts,"",$v);
        $u = explode(".",$v);
        if(count($u)=== 1){
          echo $u[0];
        }else{
          echo $u[1];
        }
        echo "<br />";
              
      }

      ?>

Readie

8:52 am on Apr 19, 2010 (gmt 0)

To find if just "site" is in the variable somewhere:

if(preg_match('/site/im', $input)) {
// Found
} else {
// Not found
}

Tommybs

12:13 pm on Apr 19, 2010 (gmt 0)

Readie, just as an aside would you not consider a regular expression overkill for needing to find a single word when stristr will suffice? I know you probably won't notice much difference performance wise, for a few rows at least, although I think preg_match is more efficient overall. It just makes it slightly easier to read using stristr.

Ashish, as an ammendment to my code above as well I think


if(count($u)=== 1){
echo $u[0];
}else{
echo $u[1];
}

could be replaced with


echo end($u);

that should return the final part of the domain and handle any weird cases that may arise. e.g. from www1. style domains.

Readie

12:59 pm on Apr 19, 2010 (gmt 0)

Spose it is a bit over the top. Thing is preg_match just returns the first pattern match, whereas stristr returns everything from the first match to the end of the string, so I should image that, while preg_match has a higher hit at the CPU, it most likely has a lower hit on the memory.

TheMadScientist

1:08 pm on Apr 19, 2010 (gmt 0)

Uh, actually, the PHP manual says when you can it's faster to use strstr than preg_match, and strpos is even faster...

strstr() manual [us3.php.net]

Note: If you only want to determine if a particular needle occurs within haystack, use the faster and less memory intensive function strpos() instead.

There's some benchmarking information between strpos() substr() and preg_match() with a link to the manual page in this thread:
[webmasterworld.com...]

If you think preg_match is as fast, you might want to have a look, because it's actually a decent percentage slower than strpos() and I'm guessing it's slower than strstr() mainly because the PHP manual says it is...

preg_match() manual [us.php.net]

Do not use preg_match() if you only want to check if one string is contained in another string. Use strpos() or strstr() instead as they will be faster.

@ Readie, just noticed it was you who posted previously, so you've already seen the other thread LOL, but I'll go ahead and let other readers see the links and info just so they aren't getting any bad info or thinking preg_match is a fair speed comparison to either of the string functions, because it's really not...

Of course from the benchmarking I've seen foreach() is no match for a for() or while() either, but it still gets recommended all the time, and I finally gave up on bothering to try to let people know, so this is like a 'last ditch attempt' to just put it out here that not all functions are created equal and my preference is speed over 'this is how I do it' so if I find a faster way the functions I use change.

[edited by: TheMadScientist at 1:16 pm (utc) on Apr 19, 2010]

Readie

1:15 pm on Apr 19, 2010 (gmt 0)

Preg_match is definately not as fast, but as I said, depending on the size of the string, logic tells me the returning variable may take up less memory than strstr. Bit of a trade off if my suspicions are correct.

[edit (for your edit :P)]

Did think it a bit strange that you were posting those links as a rpely to me, should of guessed that you started typing a reply before I posted mine.

I actually didn't know that for/while produced less overhead than foreach, but I suppose it is natural because of the unique functionality that foreach brings (the $key => $value).

TheMadScientist

1:25 pm on Apr 19, 2010 (gmt 0)

Yeah, I wouldn't have thought it would be much of a difference between the 3 when you're not using $key => $val in a foreach() but from the benchmarking I've seen for() is the fastest with while() a close second and foreach() way behind... I actually really changed my style after looking into the speed of functions. The only time I use a foreach() any more is if I absolutely need a key value pair for some reason, and I can usually code around the need if I think about it for a bit.

Readie

1:32 pm on Apr 19, 2010 (gmt 0)

I find foreach only really useful when dealing with an associative array. Numeric arrays I find actually easier to deal with in a for loop.

I've sometimes made mistakes when foreach'ing through multidimensional arrays because of the "as $someOtherVariableName".

Tommybs

4:01 pm on Apr 19, 2010 (gmt 0)

Well then based on the advice from those above, try this for some reworked code:


<?php
$urls = array("http://www.site.com/page", "http://site.com", "http://sub.example.com", "http://sub.site.co.uk","http://site.co.uk");
$exts =array("www.",".co.uk",".com",".net", ".org");
$size = count($urls); // store so we dont evaluate count each time
for($i= 0; $i < $size; $i++){
$v = parse_url($urls[$i],PHP_URL_HOST);
$v = str_replace($exts,"",$v);
$u = explode(".",$v);
echo end($u); 
echo "<br />";
}
?>

I briefly tested the above and it works!

ashish2005

8:06 pm on Apr 19, 2010 (gmt 0)

Yup, thanks for all the codes and the awesome replies.