Forum Moderators: phranque
How many characters long can a subdomain be?
What characters are legal in a subdomain? A-Za-z0-9 and?
What characters or character combinations illegal in a subdomain?
(ex: domains accept dashes, but not consecutive dashes and not dashes at the beginning or end of the domain)
See 'Hostnames' in RFC2396, RFC1034, RFC1123 as a start.
Jim
safe = "$" ¦ "-" ¦ "_" ¦ "." ¦ "+"
extra = "!" ¦ "*" ¦ "'" ¦ "(" ¦ ")" ¦ ","
national = "{" ¦ "}" ¦ "¦" ¦ "\" ¦ "^" ¦ "~" ¦ "[" ¦ "]" ¦ "`"
punctuation = "<" ¦ ">" ¦ "#" ¦ "%" ¦ <">
reserved = ";" ¦ "/" ¦ "?" ¦ ":" ¦ "@" ¦ "&" ¦ "="
hex = digit ¦ "A" ¦ "B" ¦ "C" ¦ "D" ¦ "E" ¦ "F" ¦
"a" ¦ "b" ¦ "c" ¦ "d" ¦ "e" ¦ "f"
escape = "%" hex hex
unreserved = alpha ¦ digit ¦ safe ¦ extra
uchar = unreserved ¦ escape
xchar = unreserved ¦ reserved ¦ escape
digits = 1*digit
It would be nice to see some Apache documentation that defined apache's actual ruleset for validating a subdomain though, as (I would think) their software would have the final say.
2.3. Unreserved Characters
Characters that are allowed in a URI but do not have a reserved
purpose are called unreserved. These include uppercase and lowercase
letters, decimal digits, hyphen, period, underscore, and tilde.
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
Page 17 of RFC1738 is also useful, if you can stand a bunch of recursive definition'ing.
Apache does not define any of this stuff, because Apache was written to *follow* the RFCs, not define them. This is the opposite approach as that taken by our favorite OS maker in Redmond WA. :)
Apache pretty much accepts any request that hits its port(s) -- So the validity of a hostname is determined by the DNS system, and not by servers.
Jim
I didn't find any documentation from cPanel stating their ruleset, but since that software is so widespread I ended up doing some testing to see how they handled subdomains.
The cPanel system takes any subdomain that starts with a valid character. If you start the subdomain with a valid character and throw in an invalid character somewhere along the way (hello+world), it accepts the subdomain up to the invalid character (hello), although it maps the subdomain to the entire string (home/user/public_html/hello+world).
I haven't thought of a good solution to tackle the international character sets. Unfortunately, I don't think regex has any international character sets defined.