Forum Moderators: phranque

Message Too Old, No Replies

Matching if domain name is before third slash

         

ntbgl

5:21 pm on Jun 13, 2009 (gmt 0)

10+ Year Member



I'm detecting if a link is to my domain name to determine an output:

if(preg_match("/mydomain\.com/i",$url)){}

However, I'd like to improve this, to get rid of matches like [notmine.com...]

How can I fix my regex statement to match if my domain.com comes before a possible third forward slash?

Thanks

janharders

5:30 pm on Jun 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"/^https?:\/\/([^.]*\.¦)mydomain\.com/"

should do the job, although that wouldn't work for test.www.mydomain.com -- but if that's alright with you, it should catch both, mydomain.com and www.mydomain.com, with both http and https as a protocol.
the ¦ in the parantheses is an OR-operator, which I hope php knows in the preg_match-function. I know this is the way to go in perl, but the "perl compatible regular expression"-functions in php aren't really a 100% compatible, so if it doesn't work, that might be the issue.

edit: forgot to mention, due to the forum software's parsing, make sure to replace the ¦ with a pipe, like in an if-clause for OR.

jdMorgan

9:15 pm on Jun 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



 /^https?:\/\/([^.:\/]+\.)*example\.com(\.¦\.?:[0-9]+)?/ 

matches
example.com
example.com.
example.com:80
example.com.:80
www.example.com
www.example.com.
www.example.com:80
www.example.com.:80
test.example.com
www.test.example.com.:80
etc. (these are all perfectly-valid hostnames), but matching stops if a third slash is encountered.

So, it won't match
www.exampleother.com/example.com
www.example.commm/www.example.com, etc.

Jim

ntbgl

6:50 pm on Jun 14, 2009 (gmt 0)

10+ Year Member



Thank you both for your replies.

Jim, I'm understand a lot of the links, but I don't understand example.com. Is this something I would want it to match, or not. Wouldn't that result in an error?

If I'm correct, this would also match something like www.notexamlpe.com right?

g1smd

7:04 pm on Jun 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The trailing period is allowed on a hostname, and that is one of many types of request that you should trap and redirect.

jdMorgan

7:50 pm on Jun 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



We use only "example.com" in examples here, since that domain is reserved for examples, and cannot be owned.
The pattern would not match "notexample.com" because if any character appears before "exmaple," it must be a period. The pattern will not match anything but your domain and subdomains in standard or FQDN (i.e. with a trailing period, as used in DNS records) form, and/or with a port number appended. It basically accepts any and all valid forms of your domain only.

As g1smd noted above, you should be detecting requests for all of the non-canonical variants and 301-redirecting them to the canonical hostnames. But that is not necessarily the question that you asked, and you didn't provide enough details on the purpose of this code to allow anything but a specific answer in the form of a regex pattern.

Jim