can somone improve this function?

Forum Moderators: coopster

Message Too Old, No Replies

can somone improve this function?

FiRe

12:19 am on Aug 16, 2006 (gmt 0)

I created a function to get the domain of a site using the base of the php manual:

function getdomain($site) {
preg_match('@^(?:http://)?([^/]+)@i', $site, $matches);
$host = $matches[1];
preg_match('/[^.]+\.[^.]+$/', $host, $matches);
return $matches[0];
}

Here is the problem:

echo getdomain("google.com");
// returns google.com
echo getdomain("www.google.com");
// returns google.com
echo getdomain("http://www.google.com");
// returns google.com
echo getdomain("google.co.uk");
// returns .co.uk
echo getdomain("www.google.co.uk");
// returns .co.uk
echo getdomain("http://www.google.co.uk");
// returns .co.uk

Can anyone help me fix this?

whoisgregg

7:02 pm on Aug 16, 2006 (gmt 0)

Perhaps a look at parse_url [php.net] will be helpful?

FiRe

11:39 pm on Aug 16, 2006 (gmt 0)

thats ok but i need it to avoid tricks and return only the domain! so if the url input is "test.google.com" it needs to return google.com not the test part. i now have this:

function getdomain($site) {
$site = preg_replace('/(http¦ftp)+(?:s)?:(\\/\\/)/i', "", $site);
$site = preg_replace('/www./i', "", $site);
if (strpos($site, "/")!== false) {
$x = explode("/", $site);
$site = $x[0];
}
$site = preg_match('/[^.]+\.[^.]+$/', $site, $matches);
return $matches[0];
}

now this works for "test.google.com", but if I use "google.co.uk" it gives me just the co.uk part!

[edited by: FiRe at 11:40 pm (utc) on Aug. 16, 2006]

whoisgregg

2:48 pm on Aug 17, 2006 (gmt 0)

I'm pretty sure the only way for it to "know" what is a TLD and what isn't a TLD is for you to have an array of valid TLDs.

Here's an excellent thread on domain/subdomain parsing [webmasterworld.com] that is worth reading.

FiRe

3:52 pm on Aug 17, 2006 (gmt 0)

yep thanks a lot :-)