A Simple DCT Explanation

Lena

The discrete cosine transform (DCT) is the fundamental piece of the JPEG image compression algorithm.  Please keep in mind that this page does not describe JPEG.  Instead, it demonstrates how compression can be achieved with the DCT.

Let's start with the ubiquitous Lena image.  This is a color RGB image.  The first step is to transform the colorspace of the image to a space called YCbCr.  The Y component contains the brightness information and is a grayscale image.  The Cb and Cr components contain color information.  The YCbCr image contains exactly the same information as the RGB image, just in a different representation.  

The reason for this transformation is that most of the information ends up in the Y image, with much less information in the chroma (Cb and Cr) images.  (I should point out that the YCbCr image is the same size as the RGB; they have been downsized to fit the screen.)  Further, the human eye is more sensitive to errors in brightness (the Y image) than chrominance.  The upshot is that we don't really need to store the entire Cb and Cr images.  Instead, they are downsampled by 2.

For the remainder of the page, we will only consider the Y image.  A similar process would be applied to the Cb and Cr images.

YCbCr

The Y image is divided into 8×8 blocks and the DCT is applied to each block.   Unlike the Fourier transform, the result of the DCT is real numbers, so there is no need to store complex numbers.  Note that each block tends to be brighter in the upper left corner of the block.  This is because those DCT coefficients typically have larger values, which means they contain more information.  The basic idea of DCT compression is that we can throw away the remaining values since there is not much information there anyway.  For our simple demonstration, this is exactly what we will do.  We will keep the DCT coefficients that are within a certain distance from the upper left corner of a block, and set the remaining values to zero.  If we were actually writing a compression program, we would not store these values, since we know they are all zero.  This would result in considerable compression.

This image was reconstructed with 1 coefficient (that is, 1/64):

This image was reconstructed with 3% of the coefficients:

This image was reconstructed with 34% of the coefficients:

It is difficult to see any difference between the 34% image and the original, because we have thrown away information that is less important for the eye.  Artifacts are visible in the 3% image, but at first glance it is a very reasonable representation of the original image.

This demonstrates the basic idea behind JPEG.  JPEG improves on this mostly by the handling of the DCT coefficients.  In our example, we either kept a coefficient or threw it away.  JPEG is more sophisticated. For example, JPEG allocates a varying number of bits to different coefficients.  JPEG also employs other strategies to optimize the last bit of performance.  The basic algorithm, however, uses the ideas presented here.