homepage Welcome to WebmasterWorld Guest from 54.197.215.146
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
Forum Library, Charter, Moderators: coopster & jatar k & phranque

Perl Server Side CGI Scripting Forum

    
regex optimization and logic problem
pertaining to validating a domain name
harleyx




msg:3305702
 5:56 pm on Apr 8, 2007 (gmt 0)

preg_match("/(www\.)?[a-zA-Z0-9]+(-)?[a-zA-Z0-9]+(-)?[a-zA-Z0-9]+(-)?[a-zA-Z0-9]+\.[a-zA-Z0-9\.]{2,7}/", 'www.domain.com', $matches);

This regex works for validating, and checking against several domaining rules

Starts/ends with a letter or number.
Not more than one dash in a row.
Optional www. in front.
Up to 7 digits for the extension, works with ccTLD's and ccTLD subs (co.uk, .museum, .org.uk, etc)

A couple problems I'd like to work out entirely within a single regex if possible are:

In it's current regex form the domain must be at least 4 characters, which as you probably know is not the rule. For most extensions it's two character minimum.

Also, I'd like to be able to limit the domain to 63 characters, as the majority of extensions have that as the upper limit.

If someone could modify this regex, or provide me with one that provides the functionality it already has + the functionality I need for it, I'd happily donate you some paypal change for the help.

I could do these things through some php functions (substr and such), but I would really like to contain it all in a single regex function if possible.

Thanks!

 

Dabrowski




msg:3306312
 2:13 pm on Apr 9, 2007 (gmt 0)

ok, try this, noting improper representation of character!

^(www\.)?(([a-z0-9]+(-\.)?)[a-z0-9])((\.[a-z]+){1,2})$

Let me explain, ^ on the front and $ on the end, option, but will make it either return a complete match, or nothing. e.g. 'www.mydom' would still return a partial match even though invalid.

(www\.)? - Optional www

([a-z0-9]+(-\.)?) - a letter, or number, some times, maybe 1 - or .
[a-z0-9] - domain portion must end with a proper character, prevents www.mydomain-.com

That bit wrapped in () so you can match the whole domain portion.
The '.' also allows for subdomains, e.g. www.my.domain.com

((\.[a-z]+){1,2}) - slightly different way of finding TLD, I've gone with a . and some letters a maximum of twice, e.g. .tld or .uk.tld but not .my.tld.uk

Again wrapped in () to match. I couldn't get the length checking in, but as it matches www. mydomain and .com you could easily check this afterwards, everything else is there.

The only thing I found was that if they entered an invalid tld, i.e. .my.tld.com it will still match but as (www.)(mydomain.my).tld.com

You could get around this by changing the (-\.) part if the domain name bit, unfortunately without using a huge OR'd statement checking every possible valid TLD, I think this is the best you can do. It's certainly worked for me. If anyone knows a better way then please post it!

You also might want to download this:
[weitz.de...]

Or Google 'Regex Coach' if it's been snipped.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved