Exploring the Discrete Cosine Transform

The DCT is used extensively in lossy audio and image compression. It can be found in MP3 audio compression, JPEG image compression, and MPEG-1/2 video compression. Let's take a closer look the fundamentals of how a DCT is practically used.

TL;DR

A DCT expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies.

For JPEG and MPEG video, here is the formal definition of an 8×8 DCT:

A 8×8 DCT-II
Gu,v=14α(u)α(v)x=07y=07gx,ycos[(2x+1)uπ16]cos[(2y+1)vπ16]G_{u,v}= \frac{1}{4} \alpha(u) \alpha(v) \sum_{x=0}^{7} \sum_{y=0}^{7} g_{x,y} \cos \left [ \frac{(2x+1)u\pi}{16} \right ] \cos \left [ \frac{(2y+1)v\pi}{16} \right ]

Where:

α(u)={12, if u=01, otherwise \alpha(u) = \begin{cases} \frac{1}{\sqrt{2}}, & \text{ if } u=0\\ 1, & \text{ otherwise } \end{cases}

Implemented in JavaScript as:

A naïve DCT implementation in JavaScript
const dct = function(input) {
  let output = [], v, u, x, y, sum, val, au, av;
  for (v=0; v<8; v++) {
    for(u=0; u<8; u++) {
      sum = 0;
      for (y=0; y<8; y++) {
        for(x=0; x <8; x++) {
          val = input[y*8+x];
          val *= Math.cos(((2*x+1) * u * Math.PI)/16);
          val *= Math.cos(((2*y+1) * v * Math.PI)/16);
          sum += val;
        }
      }
      au = u === 0 ? 1/Math.SQRT2 : 1;
      av = v === 0 ? 1/Math.SQRT2 : 1;
      output[v*8+u] = 1/4 * au * av * sum;
    }
  }
  return output;
}

And the corresponding inverse DCT:

The DCT-III, also known as the inverse DCT
fx,y=14u=07u=07α(u)α(v)Fu,vcos[(2x+1)uπ16]cos[(2y+1)vπ16]f_{x,y} = \frac{1}{4} \sum_{u=0}^{7} \sum_{u=0}^{7} \alpha(u) \alpha(v) F_{u,v} \cos \left [ \frac{(2x+1)u\pi}{16} \right ] \cos \left [ \frac{(2y+1)v\pi}{16} \right ]

Where:

α(u)={12, if u=01, otherwise \alpha(u) = \begin{cases} \frac{1}{\sqrt{2}}, & \text{ if } u=0\\ 1, & \text{ otherwise } \end{cases}

Also implemented in JavaScript as:

A naïve iDCT implementation in JavaScript
const idct = function(input) {
  let output = [], v, u, x, y, sum, val, au, av;
  for (y=0; y<8; y++) {
    for(x=0; x<8; x++) {
      sum = 0;
      for (v=0; v<8; v++) {
        for(u=0; u<8; u++) {
          au = u === 0 ? 1/Math.SQRT2 : 1;
          av = v === 0 ? 1/Math.SQRT2 : 1;
          val = block[v*8+u];
          val *= au;
          val *= av;
          val *= Math.cos(((2*x+1) * u * Math.PI)/16);
          val *= Math.cos(((2*y+1) * v * Math.PI)/16);
          sum += val;
        }
      }
      output[y*8+x] = 1/4 * sum;
    }
  }
  return output;
}

These JavaScript implementations are naïve implementations, i.e. that are quite computationally expensive and unoptimized. They are, however, relatively easy to reason about.

For reference, there are also several high performance variations and approximations, that reduce the number of mathematical operations required, at the cost of readability. For example:

So, what does it do?

I'm going to start by looking at a simpler one-dimensional example.

We will begin with a signal, a linear ramp, like this:

The input signal
102030405060

The input signal has been sampled 8 times, each sample is independent from it's neighboring samples; i.e. it represents the value of the signal at a specific point in space or time. A DCT transforms these discrete data points into a sum of cosine functions, each oscillating at a different frequencies and at different magnitudes.

Specifically, the DCT will transform this of 8 samples into 8 coefficients. Each coefficient will multiply a specific cosine frequency -- altering that magnitude of that function and its corresponding impact on the reconstructed signal.


Let's look at the formal definition of the forward and inverse one-dimensional DCT that we will be using.

A 1-D DCT and iDCT
Gk=α(k)2Nn=0N1gncos[πN(n+12)k]G_{k}= \alpha(k) \sqrt{\frac{2}{N}} \sum_{n=0}^{N-1} g_{n} \cos \left [ \frac{\pi}{N} \left (n + \frac{1}{2} \right ) k \right ]gn=2Nk=0N1α(k)Gkcos[πN(n+12)k]g_{n} = \sqrt{\frac{2}{N}} \sum_{k=0}^{N-1} \alpha(k) G_{k} \cos \left [ \frac{\pi}{N} \left (n + \frac{1}{2} \right ) k \right ]

Where:

α(x)={12, if x=01, otherwise g is the input G is the DCT output N is the number of samples being transformed\alpha(x) = \begin{cases} \frac{1}{\sqrt {2} }, & \text{ if } x=0\\ 1, & \text{ otherwise } \end{cases} \\ g \text{ is the input } \\ G \text{ is the DCT output } \\ N \text{ is the number of samples being transformed}

First, we can focus just on the forward DCT transformation. We can translate the forward equation into the following JavaScript:

A naïve implementation of a forward 1D DCT.
const dct1d = function(signal) {
  let N = signal.length,
      output = [],
      sum, k, n, s;

  for(k=0; k<N; k++) {
    sum = 0;
    for(n=0; n<N; n++) {
      sum += signal[n] * Math.cos(Math.PI * (n + 0.5) * k / N);
    }
    s = k===0 ? 1/Math.sqrt(2) : 1;
    output[k] = s * Math.sqrt(2/N) * sum;
  }
  return output;
};

This function will take an array of samples and return an array of equal length DCT coefficients. Now, we can use this function to transform our input signal. Something like:

var coefficients = dct1d([8,16,24,32,40,48,56,64]);

The resulting array will be 8 elements long and will look like this:

The computed DCT coefficients
-50050100

Reconstructing the signal

Now that we have our set of coefficients, how do we transform it back into the original signal? For that, we use the inverse DCT.

Again, we can express this in JavaScript as:

A naïve implementation of a inverse 1D DCT.
  const idct1d = function(dct) {
    let N = dct.length,
        signal = [],
        sum, k, n, s;

    for(n=0; n<N; n++) {
      sum = 0;
      for(k=0; k<N; k++) {
        s = k===0 ? Math.sqrt(0.5) : 1;
        sum += s * dct[k] * Math.cos(Math.PI * (n+0.5) * k / N);
      }
      signal[n] = Math.sqrt(2/N) * sum;
    }
    return signal;
  };

Let use it to reconstruct our signal:

const reconstructedSignal = idct1d(coefficients);

Again, this function returns the same number of samples as our coefficients. And aside from some small floating-point rounding errors, the reconstructed signal is identical to the original signal.

The reconstructed signal
102030405060

Okay, but why?

Up to now, you may have noticed that each transformation has been of equivalent length; e.g., n samples become n coefficients and vice versa. So, how is this actually useful for compression?

Look again at the coefficients of our compressed signal:

The computed DCT coefficients
-50050100

Notice, that the first two coefficients have a relatively large magnitude, while the rest are fairly close to zero. This is because our source signal was a simple ramp: it's value increased by 8 units at each sample.

As such, most of the energy of the signal can be expressed in the lower frequencies, while the higher frequencies have less of an overall impact on the desired signal. The DCT exploits this property; this is referred to as energy compaction.

If our initial signal was comprised of white noise, i.e. static, there would be less -- if any -- energy compaction. However many real-world samples, whether aural or visual, the signals tend to be somewhat ordered, and better suited for this type of energy compaction.


We use can quantization to squash our coefficients, which are currently real numbers, into a smaller range of integers. As a simplistic implementation, we can divide each coefficient by 50—the choice of 50 is a completely arbitrary selection on my part, it can be any number really for our purposes—and truncate the result.

const quantizer = (v) => v/50|0;
const quantized = coefficients.map(quantizer);
The quantized coefficients.
-1.0-0.50.00.51.01.52.0

After quantization, we now only have two coefficients that have values, and a long run of values of zero. This set can be run-length encoded much smaller than the original set of samples.

This is fundamentally how the DCT is used for audio and visual compression.

Lossy Compression

If you have a keen eye then you may have noticed something interesting during the quantization step in the last section. We truncated our real values into integers, i.e. we threw away some data.

While that made the data more compressible, what effect does that have on our reconstructed signal? Let's find out!

First, we need to de-quantize our coefficients:

const dequantizer = v => v*50;
const dequantized = quantized.map(dequantizer);
The dequantized coefficients.
-50050100

Notice they are not the same as the coefficients we calculated before, due to the truncation. Now, let's run the inverse DCT and see what signal we get back:

const decompressedSignal = idct1d(dequantized);
The reconstructed decompressed signal.
102030405060

At first glance, the reconstructed signal appears similar. However, on closer inspection, you can see they are actually different. That is because we threw away some of the smaller, high-frequency coefficients that were subtly adjusting the reconstructed signal. Without those frequencies, the new signal drifts away from the original.

However, compare them together on the same chart:

Both the original signal and reconstructed decompressed signal.
102030405060

Not quite the same, but close—and that's the idea. We can use the DCT, to transform our set into a more compressible set data that can be reconstructed into a similar signal, but not identical. We have, however, lost some of the original data in the process; that is what is meant by "lossy" compression.

By adjusting the quantization value (or using a quantization matrix), one can control the balance between compressibility and signal fidelity of the transformation.

Next up...

Let's explore deeper into the DCT:

  • Visualization: Behold the waveforms of the DCT and how the signal is regenerated.
  • Take it to the Next Dimension: Look at some 2D versions of the DCT.
  • Indiana Jones: Explore some visual implications/artifacts of compression.

More Reading