Welcome to WebmasterWorld Guest from 54.145.144.101

Forum Moderators: coopster & jatar k & phranque

Message Too Old, No Replies

regex optimization and logic problem

pertaining to validating a domain name

   
5:56 pm on Apr 8, 2007 (gmt 0)

10+ Year Member



preg_match("/(www\.)?[a-zA-Z0-9]+(-)?[a-zA-Z0-9]+(-)?[a-zA-Z0-9]+(-)?[a-zA-Z0-9]+\.[a-zA-Z0-9\.]{2,7}/", 'www.domain.com', $matches);

This regex works for validating, and checking against several domaining rules

Starts/ends with a letter or number.
Not more than one dash in a row.
Optional www. in front.
Up to 7 digits for the extension, works with ccTLD's and ccTLD subs (co.uk, .museum, .org.uk, etc)

A couple problems I'd like to work out entirely within a single regex if possible are:

In it's current regex form the domain must be at least 4 characters, which as you probably know is not the rule. For most extensions it's two character minimum.

Also, I'd like to be able to limit the domain to 63 characters, as the majority of extensions have that as the upper limit.

If someone could modify this regex, or provide me with one that provides the functionality it already has + the functionality I need for it, I'd happily donate you some paypal change for the help.

I could do these things through some php functions (substr and such), but I would really like to contain it all in a single regex function if possible.

Thanks!

2:13 pm on Apr 9, 2007 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



ok, try this, noting improper representation of character!

^(www\.)?(([a-z0-9]+(-\.)?)[a-z0-9])((\.[a-z]+){1,2})$

Let me explain, ^ on the front and $ on the end, option, but will make it either return a complete match, or nothing. e.g. 'www.mydom' would still return a partial match even though invalid.

(www\.)? - Optional www

([a-z0-9]+(-\.)?) - a letter, or number, some times, maybe 1 - or .
[a-z0-9] - domain portion must end with a proper character, prevents www.mydomain-.com

That bit wrapped in () so you can match the whole domain portion.
The '.' also allows for subdomains, e.g. www.my.domain.com

((\.[a-z]+){1,2}) - slightly different way of finding TLD, I've gone with a . and some letters a maximum of twice, e.g. .tld or .uk.tld but not .my.tld.uk

Again wrapped in () to match. I couldn't get the length checking in, but as it matches www. mydomain and .com you could easily check this afterwards, everything else is there.

The only thing I found was that if they entered an invalid tld, i.e. .my.tld.com it will still match but as (www.)(mydomain.my).tld.com

You could get around this by changing the (-\.) part if the domain name bit, unfortunately without using a huge OR'd statement checking every possible valid TLD, I think this is the best you can do. It's certainly worked for me. If anyone knows a better way then please post it!

You also might want to download this:
[weitz.de...]

Or Google 'Regex Coach' if it's been snipped.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month