Welcome to WebmasterWorld Guest from

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

The qualifiers of a subdomain

What are the rules, limitations?



5:05 am on Jun 4, 2008 (gmt 0)

10+ Year Member

The syntax rules of what's allowed in a subdomain (I'm assuming) must be defined by Apache (or other web server)? I've looked via google and apache.org, but cannot find a list of qualifiers for a subdomain.

How many characters long can a subdomain be?

What characters are legal in a subdomain? A-Za-z0-9 and?

What characters or character combinations illegal in a subdomain?
(ex: domains accept dashes, but not consecutive dashes and not dashes at the beginning or end of the domain)


2:31 pm on Jun 4, 2008 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

The rules for hostnames in general (including subdomains) are set out by the HTTP protocol specifications. In general a-z, 0-9, and hyphens are safe, although strict compliance with the specs requires that each "piece" of the sub-subdomain.subdomain.domain.tld path begin with a letter, and not with a number. Hyphens are only allowed between letters and numbers, and multiple hyphens aren't allowed, as you have surmised.

See 'Hostnames' in RFC2396, RFC1034, RFC1123 as a start.



5:34 pm on Jun 5, 2008 (gmt 0)

10+ Year Member

I didn't think to check the RFC's. Good idea. After reading 80 or so pages, this is the best I've come up with as far as an action definition of rules, although RFC's are not an enforced standard among browser makers, server software designers, and so on, so your mileage may vary. I'll post this so hopefully nobody else has to slag through many pages of theoretical abstracts.

safe = "$" ¦ "-" ¦ "_" ¦ "." ¦ "+"
extra = "!" ¦ "*" ¦ "'" ¦ "(" ¦ ")" ¦ ","
national = "{" ¦ "}" ¦ "¦" ¦ "\" ¦ "^" ¦ "~" ¦ "[" ¦ "]" ¦ "`"
punctuation = "<" ¦ ">" ¦ "#" ¦ "%" ¦ <">

reserved = ";" ¦ "/" ¦ "?" ¦ ":" ¦ "@" ¦ "&" ¦ "="
hex = digit ¦ "A" ¦ "B" ¦ "C" ¦ "D" ¦ "E" ¦ "F" ¦
"a" ¦ "b" ¦ "c" ¦ "d" ¦ "e" ¦ "f"
escape = "%" hex hex

unreserved = alpha ¦ digit ¦ safe ¦ extra
uchar = unreserved ¦ escape
xchar = unreserved ¦ reserved ¦ escape
digits = 1*digit

It would be nice to see some Apache documentation that defined apache's actual ruleset for validating a subdomain though, as (I would think) their software would have the final say.


6:13 pm on Jun 5, 2008 (gmt 0)

10+ Year Member

Here is another RFC definition of unreserved characters, which fits as closely as I can tell with actual allowed characters in the domain/subdomain/tld part of a URL

2.3. Unreserved Characters

Characters that are allowed in a URI but do not have a reserved
purpose are called unreserved. These include uppercase and lowercase
letters, decimal digits, hyphen, period, underscore, and tilde.

unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"

Page 17 of RFC1738 is also useful, if you can stand a bunch of recursive definition'ing.


9:31 pm on Jun 5, 2008 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

Be careful, since the rules differ for *each part* of a URI. Domains and subdomains are always lowercase characters, numbers, hyphens and dots. URL-paths can have any case letters, numbers and unreserved characters, plus the reserved characters when used for their defined purpose. Query strings can have even more characters. And finally, there are the international domains -- and frankly, I'm not fully knowledgeable about how they work.

Apache does not define any of this stuff, because Apache was written to *follow* the RFCs, not define them. This is the opposite approach as that taken by our favorite OS maker in Redmond WA. :)

Apache pretty much accepts any request that hits its port(s) -- So the validity of a hostname is determined by the DNS system, and not by servers.



6:14 pm on Jun 8, 2008 (gmt 0)

10+ Year Member

I wasn't sure what Apache's situation was, thanks for the clarification.

I didn't find any documentation from cPanel stating their ruleset, but since that software is so widespread I ended up doing some testing to see how they handled subdomains.

The cPanel system takes any subdomain that starts with a valid character. If you start the subdomain with a valid character and throw in an invalid character somewhere along the way (hello+world), it accepts the subdomain up to the invalid character (hello), although it maps the subdomain to the entire string (home/user/public_html/hello+world).

I haven't thought of a good solution to tackle the international character sets. Unfortunately, I don't think regex has any international character sets defined.


Featured Threads

Hot Threads This Week

Hot Threads This Month