Forum Moderators: phranque

Message Too Old, No Replies

Valid HTTP 1.1 Filenames

What characters are valid

         

brotherhood of LAN

12:39 am on Nov 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've been hacking at a script that takes URL's as input, they are to be sent to CURL and I need to make sure the URL submitted at least "looks" like it's a valid one.

I can check the domain no problem (less than 26 characters, begins with an alphanumeric etc but I'm not sure what constitutes a valid filename.

Anyone know of the magic answer, or the RFC mentioning this area?

victor

1:59 am on Nov 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



[faqs.org...]

Bear in mind a valid file name is not the same as a valid URL.

What is acceptable for a file name varies across systems: MAC. Win, Unix, etc.... as does what is seen as a different file name (WIN folds letters; UNIX is usually case sensitive).

It's a minefield of cross-platform niggles.

If you can restrict the file names to lowercase letters, digits, hyphens (though not leading, just in case) and perhaps the odd period, you should work almost anywhere.

brotherhood of LAN

2:26 am on Nov 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



thanks victor,

I was just trying to narrow down wasted HTTP requests to URL's that would inevitable return an error.

> lowercase letters

If I remember right IIS treats URI's case-insensitive while apache is sensitive..(or maybe windows/unix systems, yeah thats it i think).... a minefield as you say ;)

I'll make a check to remove non-printable characters and escape the string, I guess that will be enough for the script to run without an error.

Cheers!

victor

11:20 am on Nov 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I develop under Windows and run live under UNIX.

I'm extremely careful about cross-platform file names....I haven't been caught out for months! :)

richmondsteve

3:11 pm on Nov 27, 2003 (gmt 0)

10+ Year Member



It may differ by TLD, but the max characters for .com/.net/.org domain names were extended to 63 characters a couple of years ago.

The *nix filesystem is case-sensitive, but URIs on web servers on *nix systems are not necessarilly case-sensitive. For example, Apache servers with mod-speling [httpd.apache.org] (yes, it only has one "l") enabled (most have it disabled by default) make URIs case-insensitive (and check for spelling errors).