Binary and ascii formats for ftp

Forum Moderators: open

Message Too Old, No Replies

Binary and ascii formats for ftp

Why upload html to server in binary?

geekay

3:57 am on Apr 20, 2005 (gmt 0)

I know I should upload HTML files to the server in binary format. But I once tried using ASCII in my FTP, and those pages then loaded in the browser as normal (as far as I could see). The file size was a little smaller than for the binary format. Apache server.

It would be interesting to know why ASCII and HTML do not mix well.

bill

4:44 am on Apr 20, 2005 (gmt 0)

I think you've got it backwards. HTML and text-type files should always be transferred as ASCII. It's usually photos and images that need to be transferred as binary.

geekay

5:25 am on Apr 20, 2005 (gmt 0)

That's interesting. I've been experimenting with both and can't notice any difference in how a HTML page displays in a browser, but on-server file size is different. (But I've also used the application's auto option.)
WS_FTP help only says (maybe I have a pre-HTML version...):

"Binary. A file format that is used for non-text files such as executable programs, word processing documents, spreadsheets, databases, graphics files, and sound files."

"ASCII. Text file fomat. Text files have end-of-line characters; the end-of-line character is different on different types of computers."

kaled

9:29 am on Apr 20, 2005 (gmt 0)

Since white space beyond the first character is, broadly, ignored in html, it makes no difference whether you upload in ASCII or BINARY. I generally upload in BINARY, but ASCII is probably advisable.

Kaled.

Farix

11:18 am on Apr 20, 2005 (gmt 0)

It has to do with the CR/LF characters at the end of each line. Under Microsoft OSes, you need both a Carrage Return and a Line Feed in a text file, but Unix servers just requires the Line Feed.

geekay

11:52 am on Apr 20, 2005 (gmt 0)

Pardon me, but I need a little bit more help to understand this. When I upload HTML in BINARY format the file size on the Linux server is exactly the same as on my Windows PC's local hard drive, making it easy to check.

When I upload in ASCII the file's size on the server is smaller. If ASCII includes all the CR/LF characters one would imagine that the ASCII uploaded file is larger than the same binary one.

Is there ever any practical reason to bother about binary/ASCII, if I'm uploading HTML files? Does it matter if the webmaster's computer uses Linux or Windows, and if the server is Windows or Unix? I thought this was very elementary.

rocknbil

3:09 pm on Apr 20, 2005 (gmt 0)

It does not matter for HTML, EXCEPT . . . this is the big difference:

It has to do with the CR/LF characters at the end of each line.

Do a little expirament: upload a file in binary, then re-download it and open both your original and the downloaded version in Notepad. You will see black boxes where the carriage returns should be. This is because of the way those characters are handled by various OS's.

For HTML it doesn't matter. But for perl scripts, this will generate an unexplained "premature end of script headers" error because the EOL characters are important when executing a server-side script.

Just get in the habit of uploading images/executables in binary, text files (which is all HTML really is) in ASCII.

kaled

8:11 pm on Apr 20, 2005 (gmt 0)

Browsers don't care what sort of linebreak is used.

If you upload (or download, I guess) in ASCII mode, linebreaks are converted to the format of the destination. If you upload from a unix machine to a Windows machine using ASCII mode, the file is likely to become larger since LF chars may be replaced by CRLF chars.

Kaled.

rjohara

8:19 pm on Apr 20, 2005 (gmt 0)

Doesn't the preferred upload method depend in part on the character encoding of the HTML file? If your HTML is Unicode encoded, you can't upload as ASCII. It seems that when I first converted my pages to Unicode I had to change my upload protocol. (But I'm decidedly not an encoding expert.)

geekay

8:40 pm on Apr 20, 2005 (gmt 0)

I'm still using iso-8859-1 encoding.

I thought this was a simply and elementary question, but it seems to be quite complex. But obviously everybody knows how to upload, because I haven't heard of other's problems.

Farix

12:58 am on Apr 21, 2005 (gmt 0)

Doesn't the preferred upload method depend in part on the character encoding of the HTML file? If your HTML is Unicode encoded, you can't upload as ASCII. It seems that when I first converted my pages to Unicode I had to change my upload protocol. (But I'm decidedly not an encoding expert.)

No. All HTML are ASCII files. What the character encoding does is instruct the browser on how to render certain ASCII character sequences to produce additional characters.

rjohara

3:43 am on Apr 21, 2005 (gmt 0)

No. All HTML are ASCII files.

I don't believe that's correct. Most English-language webpages do happen to be implicitly or explicitly encoded as ISO 8859-1, which is a rough sort of "extended ASCII," but the W3C adopted Unicode some time ago as the basis for HTML, which lets us type Greek or Chinese of Cyrillic or Ogham directly in a Unicode-encoded HTML document. It's just that most English speakers don't have reason to make use of this ability.

Character encoding is an *exceedingly* abstruse area. Jukka Korpela's tutorial [cs.tut.fi] is widely regarded as the best technical introduction to the subject for the seriously curious.

2by4

3:44 am on Apr 21, 2005 (gmt 0)

It matters, I had a problem recently, took me a while to figure it out, I'd been uploading php .inc files, and had forgotten to add .inc to my list of ascii file types in my ftp client, the scripts would get strange and inconsistent failures.

Same for loading binary files as ascii.

Use the right type, ascii for all text, binary for everything else.

bill

5:35 am on Apr 21, 2005 (gmt 0)

No. All HTML are ASCII files.

This is what I was taught, and I have always uploaded my Japanese, Chinese and UTF-8 HTML files as ASCII. This is the first I've heard of an argument for uploading HTML as Binary. I read through Jukka Korpela's tutorial, but that seems to deal with the encoding display of characters rather than the transfer of of the HTML itself. I'm not an expert on this either, and would be interested to hear more about this.

kaled

9:33 am on Apr 21, 2005 (gmt 0)

All program scripts must be uploaded in ASCII mode. That is an entirely different issue to HTML files. One is handled by the server and one is handled by the client.

Personally, I think it is pathetic that script interpreters can't handle different linebreak formats, but they can't so scripts have to be uploaded in ASCII mode.

Kaled.

rjohara

6:53 pm on Apr 21, 2005 (gmt 0)

I have always uploaded my Japanese, Chinese and UTF-8 HTML files as ASCII. This is the first I've heard of an argument for uploading HTML as Binary. I read through Jukka Korpela's tutorial, but that seems to deal with the encoding display of characters rather than the transfer of of the HTML itself. I'm not an expert on this either, and would be interested to hear more about this.

We've certainly got two issues mixed together here. One is the nature of HTML files, which *may* be *encoded* as ASCII, but also may be *encoded* as any of a number of other things; Bill's Japanese files, for example, aren't encoded as ASCII, but probably Shift-JIS or Unicode. The second issue (which was the original poster's question, and which respondents like me have diverted) was the *upload method*, which we tend to call either "ASCII" or "binary". I know even less about upload protocols than I do about encoding, so along with Bill I'd like to have someone clarify this for us. My guess is that the upload method called "ASCII" doesn't really have anything to do anymore with the actual ASCII encoding system of the simple Latin alphabet, and that the name "ASCII upload" is just a vestige of past practices, but that's what we need someone to clarify for us.

tedster

7:32 pm on Apr 21, 2005 (gmt 0)

Here's a good thread on the topic from our own Wayback Machine:
[webmasterworld.com...]

Key excerpts [with new annotations]:

Tedster:
Every byte is 8 bits. In ASCII, a byte is made up of 7 "significant" bits and an eighth, "insignificant" bit, used for error control [and in recent years, for other server information, such as internationalization data.]. Binary uses all 8 bits as significant bits. Text-only is ASCII, whereas word processing documents, image and data files, etc, are binary.
If a binary file is transferred in ASCII mode, every eighth bit becomes corrupted and the file unusable � you inevitably corrupt a binary file by sending it as ASCII.
But sending ASCII files as binary works much of the time, if the operating systems on both ends are using the same convention for that eighth bit. However, you certainly CAN get into trouble with Unix/Mac/Windows mismatches [or any server that is configured to use different conventions from the client machine doing the upload.]
bobriggs:
The main use of ASCII transfer in FTP is for the FTP server to strip line ending characters (carriage return and line feeds) into the correct form for the server: on *NIX boxes, a CR/LF pair will be converted into a single LF. (MS-DOS/Windows computers use Carriage Return/Line Feed pairs to denote the end of a line)
All files are binary, ASCII is just a subset. It would be less confusing if instead of ASCII mode, the FTP clients called it 'text mode'. So if you sent an 8 bit ASCII file via FTP, any 8th bit that is set would be retained, and any CR/LF pairs would be converted to a single LF.