Forum Moderators: open
It would be interesting to know why ASCII and HTML do not mix well.
"Binary. A file format that is used for non-text files such as executable programs, word processing documents, spreadsheets, databases, graphics files, and sound files."
"ASCII. Text file fomat. Text files have end-of-line characters; the end-of-line character is different on different types of computers."
When I upload in ASCII the file's size on the server is smaller. If ASCII includes all the CR/LF characters one would imagine that the ASCII uploaded file is larger than the same binary one.
Is there ever any practical reason to bother about binary/ASCII, if I'm uploading HTML files? Does it matter if the webmaster's computer uses Linux or Windows, and if the server is Windows or Unix? I thought this was very elementary.
It has to do with the CR/LF characters at the end of each line.
Do a little expirament: upload a file in binary, then re-download it and open both your original and the downloaded version in Notepad. You will see black boxes where the carriage returns should be. This is because of the way those characters are handled by various OS's.
For HTML it doesn't matter. But for perl scripts, this will generate an unexplained "premature end of script headers" error because the EOL characters are important when executing a server-side script.
Just get in the habit of uploading images/executables in binary, text files (which is all HTML really is) in ASCII.
If you upload (or download, I guess) in ASCII mode, linebreaks are converted to the format of the destination. If you upload from a unix machine to a Windows machine using ASCII mode, the file is likely to become larger since LF chars may be replaced by CRLF chars.
Kaled.
Doesn't the preferred upload method depend in part on the character encoding of the HTML file? If your HTML is Unicode encoded, you can't upload as ASCII. It seems that when I first converted my pages to Unicode I had to change my upload protocol. (But I'm decidedly not an encoding expert.)
No. All HTML are ASCII files.
I don't believe that's correct. Most English-language webpages do happen to be implicitly or explicitly encoded as ISO 8859-1, which is a rough sort of "extended ASCII," but the W3C adopted Unicode some time ago as the basis for HTML, which lets us type Greek or Chinese of Cyrillic or Ogham directly in a Unicode-encoded HTML document. It's just that most English speakers don't have reason to make use of this ability.
Character encoding is an *exceedingly* abstruse area. Jukka Korpela's tutorial [cs.tut.fi] is widely regarded as the best technical introduction to the subject for the seriously curious.
Same for loading binary files as ascii.
Use the right type, ascii for all text, binary for everything else.
No. All HTML are ASCII files.This is what I was taught, and I have always uploaded my Japanese, Chinese and UTF-8 HTML files as ASCII. This is the first I've heard of an argument for uploading HTML as Binary. I read through Jukka Korpela's tutorial, but that seems to deal with the encoding display of characters rather than the transfer of of the HTML itself. I'm not an expert on this either, and would be interested to hear more about this.
Personally, I think it is pathetic that script interpreters can't handle different linebreak formats, but they can't so scripts have to be uploaded in ASCII mode.
Kaled.
I have always uploaded my Japanese, Chinese and UTF-8 HTML files as ASCII. This is the first I've heard of an argument for uploading HTML as Binary. I read through Jukka Korpela's tutorial, but that seems to deal with the encoding display of characters rather than the transfer of of the HTML itself. I'm not an expert on this either, and would be interested to hear more about this.
We've certainly got two issues mixed together here. One is the nature of HTML files, which *may* be *encoded* as ASCII, but also may be *encoded* as any of a number of other things; Bill's Japanese files, for example, aren't encoded as ASCII, but probably Shift-JIS or Unicode. The second issue (which was the original poster's question, and which respondents like me have diverted) was the *upload method*, which we tend to call either "ASCII" or "binary". I know even less about upload protocols than I do about encoding, so along with Bill I'd like to have someone clarify this for us. My guess is that the upload method called "ASCII" doesn't really have anything to do anymore with the actual ASCII encoding system of the simple Latin alphabet, and that the name "ASCII upload" is just a vestige of past practices, but that's what we need someone to clarify for us.
Key excerpts [with new annotations]:
Tedster:
Every byte is 8 bits. In ASCII, a byte is made up of 7 "significant" bits and an eighth, "insignificant" bit, used for error control [and in recent years, for other server information, such as internationalization data.]. Binary uses all 8 bits as significant bits. Text-only is ASCII, whereas word processing documents, image and data files, etc, are binary.If a binary file is transferred in ASCII mode, every eighth bit becomes corrupted and the file unusable — you inevitably corrupt a binary file by sending it as ASCII.
But sending ASCII files as binary works much of the time, if the operating systems on both ends are using the same convention for that eighth bit. However, you certainly CAN get into trouble with Unix/Mac/Windows mismatches [or any server that is configured to use different conventions from the client machine doing the upload.]
bobriggs:
The main use of ASCII transfer in FTP is for the FTP server to strip line ending characters (carriage return and line feeds) into the correct form for the server: on *NIX boxes, a CR/LF pair will be converted into a single LF. (MS-DOS/Windows computers use Carriage Return/Line Feed pairs to denote the end of a line)All files are binary, ASCII is just a subset. It would be less confusing if instead of ASCII mode, the FTP clients called it 'text mode'. So if you sent an 8 bit ASCII file via FTP, any 8th bit that is set would be retained, and any CR/LF pairs would be converted to a single LF.