Forum Moderators: not2easy

Message Too Old, No Replies

Why digital camera files are bloated, and how to fix them

Fact-packed continuation of thread "new digital camera - new jpg problems"

         

SimonG

9:57 pm on Mar 24, 2005 (gmt 0)



A thread called "new digital camera - new jpg problems" discussed, without coming to a conclusion, the way that JPEG images from digital cameras such as Kodak's DC series (and many others) seem to include redundant data. I was not allowed to extend that thread (aparently old ones get filled with spam unless locked) but invited to start a new one. Here goes...

This redundant data, common in files saved from digital cameras, can be removed without any change to the picture in the file, often making the file tens of per cent smaller. This posting explains why, and
lots of ways to do it with all sorts of computer, and other stuff about digital cameras which SimonG reckons deserve to be better known. ;-)

Having written a free digicam driver for this sort of camera (for the now-obscure Qdos operating system, which was multi-tasking on 8 bit computers in 1984 better than Windows or MacOS two decades later) and
owned several of these cameras, I know what's going on in the files, and the effects of removing those ill-defined extras. Keep reading, and you will too. :-)

The hidden bloat

The main reason that digicam snaps can be losslessly scrunched is that the camera saves other information besides what you see when you uncompress the image. Particularly if the full image resolution is fairly low (e.g. 640x480), this may be as much as a quarter of the whole file!

The bulk of this is the uncompressed 'thumbnail' image which the camera shows as you scroll through the contents of its memory, and which provides the instant but grainy image that appears when you select one of the pictures for display on the rear screen or TV output. Even though this image is relatively low in resolution, in order to show it instantly it's hardly compressed - except by the CFA scheme which routinely allows digicam manufacturers to exagerate the resolution of their products by a factor of
four!

CFA explained

A so-called 2 megapixel camera almost invariably only measures the exact colour of half a million pixels - the CFA or Beyer pattern means that each colour is represented by the brightness in four monochrome cells in a 2x2 grid (like Battenburg cake). The two diagonal corners contribute green information (the eye is most sensitive to green) while the opposite ones measure red and blue. Thus the camera measures the (filtered) brightness of 2 million pixels but the colour of only half a million, in groups of four.

The LCD back-panel uses the same trick to quote four times as many pixels as it has points of arbitrary colour, and JPEG only stores the colour of groups of four pixels but the brightness of each, exploiting the fact that the human eye has more acute sensitivity to luminance than to colour (as it has far more mono rod sensors than colour cone ones).

Anyway the JPEG scheme does a good job of compressing an image by a factor of 20, while CFA only manages a grainy factor of four, so even though the thumbnail is relatively low-resolution it still takes up a fair chunk of the space in the file. If you prefer to view the image outside the camera you can discard this and save yourself, and visitors to your web site, valuable time and space by removing the thumbnail and similar
camera-specific data, by techniques listed in the old thread and later here.

More padding

The camera uses an extension to the JPEG (Joint Photographic Experts Group) standard called EXIF to add the thumbnail, a verbose and useless plug for the manufacturer (which few people ever see), the date (worth extracting if your transfer software otherwise sets a new one) and settings in force when the picture was taken, such as Flash Mode, Compression, Exposure adjustment, Timer and Zoom selections. You might find those interesting but they're not needed to decompress the main image.

EXIF is a standard for packing camera settings, copyright notices and index information into JPEG files. The result is still a JPEG, which can be
rendered by any program designed to decode that standard, but a lot more besides. A quick way to detect the extras is to look at the start of a file for the characters 'Exif' or 'JFIF'.

The free jhead command for many operating systems can read and manipulate all the extra information that JFIF/EXIF adds to a JPEG file. For BSD, Linux, MacOS X, Solaris and Windows versions, and interesting technical links, see:

[sentex.net...]

There's an Amiga version (of course) on Aminet (ditto):

[wuarchive.wustl.edu...]

Programmers who want to look deeper into EXIF or JFIF files might try the exdump script, which extracts and reports dates and dimensions from Exif files. exif.pm is a Perl package that knows about some 80 sections that may occur in an Exif file. If you prefer Python, goggle for exfifdump.py.

Some free cures

The original posting suggested various ways to shrink JPEGs without losing information from the main image. Personally I prefer the free open-source jpegoptim, which runs on many systems including...

Amiga: [wuarchive.wustl.edu...]

Linux and Unix: [cc.jyu.fi...]

Windows: [pornel.ldreams.net...]

JPEGoptim supports lossless compression by removing the extra data (such as thumbnail images and EXIF format camera settings) and more cleverly by optimising the encoding of the image to store the same data in less space by better use of Huffman code than the camera can afford to perform.

There's a trade-off between compression factor on one hand and compression speed, camera price, battery consumption, and the rate at which the camera can be ready to take another snap on the other, which means that the same data can often be re-encoded more efficiently if you've got the power to try again.

jpegoptim can also recompress images more heavily, with some loss of detail, through its optional ability to perform lossy compression, controlled by a 'quality factor' similar to the good/better/best options on
the Kodak and similar cameras. Those settings correspond to increasing quality factors in jpegoptim's lossy compression. Of course once information has been lost you can't restore it by using a higher factor - lossy compression is a one-way street.

I find it's rarely worth devoting more than about 50K, and often much less (depending on the subject) to an image on the web at a resolution of 640x480 (ask for more and many web users, such as those with TV-standard browsers, will only see a slice at a time!).

If you don't like command-lines and are trapped with Windows (in)compatibility, and don't mind recompressing your files one by one, PegIt does similar things with a GUI; on fast computers you can see the image degrade alongside the original as you move a slider to adjust the lossy compression factor, and save it when the size and quality match your personal standards:

[ravenblack.net...]

If you change your mind about the command line you can also do simple batch operations with PegIt.

If you use Qdos or the follow-up SMS/Q operating system, or one of the Qdos emulators for more recent hardware, or just want to see a driver written in a few K of block-structured BASIC that does more than some of the CD-based ones, send me (SimonG) a StickyMessage via this forum. I've written articles about this sort of thing for QL, Linux and Amiga Format magazines, and welcome questions about those, but I don't answer questions specifically about Windows; I know better. ;-)

Over to you!

SimonG (Simon N Goodwin, Warwick, UK 2005/3/24)

Leosghost

12:06 pm on Apr 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I find it hard to beleive that no-one here said thank you for the level of info you gave ..very clearly put too :)..So I will do it ..Thankyou Simon..

And welcome to WebmasterWorld

Birdman

3:04 pm on Apr 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



WOW! Very enlightening SimonG. Thank you kindly for that info...great first post! I'm off to install JPEGoptim on my Gentoo box now.

Birdman

ken_b

3:34 pm on Apr 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Very informative! Thanks a lot for posting this.

Birdman

4:12 pm on Apr 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just wanted to report my results of using jpegoptim!

On my full-size images(1,900 of them!), I reduced them by 20% average.

On the thumbs(1,900) I saved 10% on average.

It was as simple as:

root@tux images jpegoptim --totals --strip-all -m 75 -d new_dir *.jpg

Considering that 75% of my site's bandwidth is from jpegs, that is a sweet deal. I should save a couple gigs per month!

Thanks again SimonG

willybfriendly

4:42 pm on Apr 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



GIMP also has an option to drop EXIF data when saving an image. SimonG, do you have any comments on how well it performs compared to the other software you have mentioned?

WBF

Birdman

5:21 pm on Apr 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



willybfriendly,

I just tested it on GIMP and the results were exactly the same! I haven't used GIMP on the command line though. I just tested it on one large jpeg.

Birdman

willybfriendly

10:16 pm on Apr 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



With a little better user interface GIMP could put the squeeze on Photoshop. I find it indispensable.

I discovered that option to drop the EXIF data quite by accident one day.

WBF

reddevil

10:47 am on Apr 27, 2005 (gmt 0)

10+ Year Member



SimonG

Exactly what they said! Your information is liek receiving some insider trading information (haha).

I have tried the windows version and dragged and dropped my jpegs onto the application but Nothing happened and my photo size actually increased very slightly (duh!).

Is the drag and drop all I have to do or am I missing something? Is it possible that my photos didn't have all that useless information in the first place? It is only a fairly cheapo Sony 3.2 megapixel?

Is there any way to tell if the EXIF data is still there?