Forum Moderators: coopster

Message Too Old, No Replies

Need some help about a regular expression

I want to know if a url is under domain widget.tld

         

iProgram

7:39 am on Oct 3, 2005 (gmt 0)

10+ Year Member



I want to validate a url provide by a visitor (for example, the referer URL) is under domain widget.tld. For example, the following URLs are valid:
[widget.tld...]
[widget.tld...]
[widget.tld...]
[sub.widget.tld...]
[abc.widget.tld...]

And this one is invalid because it's belong to another domain:
[other-widget.tld...]

The problem is, sometime I also want to make a more strict rule, for example, only pages under a given sub-domain are valid. If the given domain is sub.widget.tld, then:

[widget.tld...] invalid
[widget.tld...] invalid
[widget.tld...] invalid
[sub.widget.tld...] only this URL is valid
[abc.widget.tld...] invalid

Now I want to test URLs using a single regular expression:

if( ereg ($rule, $ref) )
{
...
}

$ref is the URL to be tested, $rule is a regular expression which can be changed at anytime(so that I can decide which domain/subdomain to use). So, how to write this $rule?

ergophobe

4:51 pm on Oct 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, using the PCRE syntax instead of ereg and using backticks ` instead of slashes / as delimiters so we don't have to escape slashes...


$subpats = array (
'none' => '',
'www' => 'www\.',
'sub1' => 'sub1\.',
'sub2' => 'sub2\.',
'all_strict' => '((www¦sub1¦sub2)\.)?',
'all_loose' => '([^\.]*)?');

$pattern = '`http://' . $subpats['all_strict'] . 'widgets\.tld/.*`U';

The \. in the array may need to be double-escaped as \\. (once for PHP once for the regex engine) so you'll need to test that.

And as always, the ¦ needs to be typed in. It can't be pasted from the forum.

Anyway, you should be able to use that or something like it (not tested) and select the proper pattern based on the array. You could use this, for example, to parse the URL and restrict within a subdomain.

For example, if you grab the incoming REQUEST_URI and know it to be sub1.example.com/page1/ you could do something like

$path = explode('/', $url);
$subdomain = str_replace('example.com', '', $path[0]);

if (empty($subdomain))
{
$index = 'none';
}
elseif (!empty($subpats[$subdomain]))
{
$index = $subdomain;
}
else
{
$index = 'all_strict';
}

$pattern = '`http://' . $subpats[$index] . 'widgets\.tld/.*`U';