homepage Welcome to WebmasterWorld Guest from 54.237.54.83
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
The qualifiers of a subdomain
What are the rules, limitations?
harleyx




msg:3666374
 5:05 am on Jun 4, 2008 (gmt 0)

The syntax rules of what's allowed in a subdomain (I'm assuming) must be defined by Apache (or other web server)? I've looked via google and apache.org, but cannot find a list of qualifiers for a subdomain.

How many characters long can a subdomain be?

What characters are legal in a subdomain? A-Za-z0-9 and?

What characters or character combinations illegal in a subdomain?
(ex: domains accept dashes, but not consecutive dashes and not dashes at the beginning or end of the domain)

 

jdMorgan




msg:3666706
 2:31 pm on Jun 4, 2008 (gmt 0)

The rules for hostnames in general (including subdomains) are set out by the HTTP protocol specifications. In general a-z, 0-9, and hyphens are safe, although strict compliance with the specs requires that each "piece" of the sub-subdomain.subdomain.domain.tld path begin with a letter, and not with a number. Hyphens are only allowed between letters and numbers, and multiple hyphens aren't allowed, as you have surmised.

See 'Hostnames' in RFC2396, RFC1034, RFC1123 as a start.

Jim

harleyx




msg:3667776
 5:34 pm on Jun 5, 2008 (gmt 0)

I didn't think to check the RFC's. Good idea. After reading 80 or so pages, this is the best I've come up with as far as an action definition of rules, although RFC's are not an enforced standard among browser makers, server software designers, and so on, so your mileage may vary. I'll post this so hopefully nobody else has to slag through many pages of theoretical abstracts.

safe = "$" ¦ "-" ¦ "_" ¦ "." ¦ "+"
extra = "!" ¦ "*" ¦ "'" ¦ "(" ¦ ")" ¦ ","
national = "{" ¦ "}" ¦ "¦" ¦ "\" ¦ "^" ¦ "~" ¦ "[" ¦ "]" ¦ "`"
punctuation = "<" ¦ ">" ¦ "#" ¦ "%" ¦ <">

reserved = ";" ¦ "/" ¦ "?" ¦ ":" ¦ "@" ¦ "&" ¦ "="
hex = digit ¦ "A" ¦ "B" ¦ "C" ¦ "D" ¦ "E" ¦ "F" ¦
"a" ¦ "b" ¦ "c" ¦ "d" ¦ "e" ¦ "f"
escape = "%" hex hex

unreserved = alpha ¦ digit ¦ safe ¦ extra
uchar = unreserved ¦ escape
xchar = unreserved ¦ reserved ¦ escape
digits = 1*digit

It would be nice to see some Apache documentation that defined apache's actual ruleset for validating a subdomain though, as (I would think) their software would have the final say.

harleyx




msg:3667814
 6:13 pm on Jun 5, 2008 (gmt 0)

Here is another RFC definition of unreserved characters, which fits as closely as I can tell with actual allowed characters in the domain/subdomain/tld part of a URL

2.3. Unreserved Characters

Characters that are allowed in a URI but do not have a reserved
purpose are called unreserved. These include uppercase and lowercase
letters, decimal digits, hyphen, period, underscore, and tilde.

unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"

Page 17 of RFC1738 is also useful, if you can stand a bunch of recursive definition'ing.

jdMorgan




msg:3667972
 9:31 pm on Jun 5, 2008 (gmt 0)

Be careful, since the rules differ for *each part* of a URI. Domains and subdomains are always lowercase characters, numbers, hyphens and dots. URL-paths can have any case letters, numbers and unreserved characters, plus the reserved characters when used for their defined purpose. Query strings can have even more characters. And finally, there are the international domains -- and frankly, I'm not fully knowledgeable about how they work.

Apache does not define any of this stuff, because Apache was written to *follow* the RFCs, not define them. This is the opposite approach as that taken by our favorite OS maker in Redmond WA. :)

Apache pretty much accepts any request that hits its port(s) -- So the validity of a hostname is determined by the DNS system, and not by servers.

Jim

harleyx




msg:3669910
 6:14 pm on Jun 8, 2008 (gmt 0)

I wasn't sure what Apache's situation was, thanks for the clarification.

I didn't find any documentation from cPanel stating their ruleset, but since that software is so widespread I ended up doing some testing to see how they handled subdomains.

The cPanel system takes any subdomain that starts with a valid character. If you start the subdomain with a valid character and throw in an invalid character somewhere along the way (hello+world), it accepts the subdomain up to the invalid character (hello), although it maps the subdomain to the entire string (home/user/public_html/hello+world).

I haven't thought of a good solution to tackle the international character sets. Unfortunately, I don't think regex has any international character sets defined.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved