Forum Moderators: open

Message Too Old, No Replies

reserved character list filenames and URL length

is there a official reserved character list?

         

ganeshgrowth

1:27 pm on Oct 16, 2009 (gmt 0)

10+ Year Member



Is there an official reserved character list to avoid using with URLs

are . , .. / % () to be avoided when using with filenames to be used with URLs?

also, is there an official list.

Again what is the URL length that's to be considered from accessibility point of view?

jdMorgan

1:54 pm on Oct 16, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Reserved character list: RFC 3986 - Uniform Resource Identifiers (URI): Generic Syntax [tools.ietf.org]
(Note that the reserved character-sets change depending on which part of a URL is being discussed.)

URL length for usability: Short. :)
(No shift-key or special characters, and if you can't read it aloud on the radio or telephone and have the listener write it down correctly, then it likely needs improvement)

Jim

ganeshgrowth

2:07 pm on Oct 16, 2009 (gmt 0)

10+ Year Member



is this fine

example.com/15K-(1.2-inches)-swiss-watch.html

tedster

2:47 pm on Oct 16, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The parentheses can cause secondary problems - for instance, some url shorteners may misfire, various highlighting scripts can have problems, etc.

According to the resource that jdMorgan linked to above, the only unreserved (and always safe) characters are:

ALPHA / DIGIT / "-" / "." / "_" / "~"

I also suggest avoiding three of those unreserved characters as much as possible:

  1. Avoid the underscore character [_] because it is a challenge to search engines. (reference [mattcutts.com])

  2. Avoid the period or dot [.] except in the hostname and immediately preceding the file extension. It is not common in the rest of the url and can cause user confusion

  3. Avoid the tilde [~] because you cannot "read it aloud on the radio or telephone" and have the average person know what you mean.

jdMorgan

4:10 pm on Oct 16, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One note, though: The document is written to describe the 'safe' characters as those that are not required to be escaped when present in a URL-path *and* that "have no meaning" to programs that parse URLs.

However, if you do URL rewriting, then you are applying an additional 'small program' that understands part of the meaning of reserved characters.

As a result, should you decide to use URL rewriting and a hierarchical URL format, it would be acceptable to use something like

example.com/watch/swiss/15k/1.2-inch

Note that the 'keywords' are most-important/competitive first, as well as establishing a top-down product URL classification system, and/so that they appear to indicate a 'directory' structure. I also dumped the uppercase "K" in favor of the better-usability lowercase character.

However, as noted, this approach will only work if you use URL rewriting; otherwise the server will have no idea how to handle a 'filetype' of ".2-inch".

Another reason to avoid underscores is that they are hidden_by link_underlining [webmasterworld.com] and therefore may appear to simply be spaces (If this isn't clear, copy and paste the immediately-preceding linked text into a plain-text editor and examine it).

Just some ideas...

Jim

swa66

4:23 pm on Oct 16, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Official list: RFC1738
[rfc-editor.org...]

In case you BNF is rusty, in the path section you're allowed:

A-Z
a-z
0-9
$ - _ . +
! * ' ( ) ,

Rest needs to be %-escaped

This is updated in RFC3986
it allows:
a-z
A-Z
0-9
- . _ ~
: @ ! $ & ' ( ) * + , ; =

But I'd really stick to the smallest set.

ganeshgrowth

6:06 am on Oct 20, 2009 (gmt 0)

10+ Year Member



does anyone have any resource/list to different types of accessibility problems those arise w.r.t URLs.

like:-


Avoid the underscore character [_] because it is a challenge to search engines. (reference)

Avoid the period or dot [.] except in the hostname and immediately preceding the file extension. It is not common in the rest of the url and can cause user confusion

Avoid the tilde [~] because you cannot "read it aloud on the radio or telephone" and have the average person know what you mean.


No shift-key or special characters, and if you can't read it aloud on the radio or telephone and have the listener write it down correctly, then it likely needs improvement