Exploring the Discrete Cosine Transform

in

The DCT is used extensively in lossy audio and image compression. It can be found in MP3 audio compression, JPEG image compression, and MPEG-1/2 video compression. Let’s take a closer look the fundamentals of how a DCT is practically used.

Tl;Dr

A DCT expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies.

For JPEG and MPEG video, here is the formal definition of an 8×8 DCT:

Gu,v=14α(u)α(v)x=07y=07gx,ycos[(2x+1)uπ16]cos[(2y+1)vπ16]G_{u,v}= \frac{1}{4} \alpha(u) \alpha(v) \sum_{x=0}^{7} \sum_{y=0}^{7} g_{x,y} \cos \left [ \frac{(2x+1)u\pi}{16} \right ] \cos \left [ \frac{(2y+1)v\pi}{16} \right ]

A 8×8 DCT-II

Where:

Implemented in JavaScript as:

const dct = function(input) {
  let output = [], v, u, x, y, sum, val, au, av;
  for (v=0; v<8; v++) {
    for(u=0; u<8; u++) {
      sum = 0;
      for (y=0; y<8; y++) {
        for(x=0; x <8; x++) {
          val = input[y*8+x];
          val *= Math.cos(((2*x+1) * u * Math.PI)/16);
          val *= Math.cos(((2*y+1) * v * Math.PI)/16);
          sum += val;
        }
      }
      au = u === 0 ? 1/Math.SQRT2 : 1;
      av = v === 0 ? 1/Math.SQRT2 : 1;
      output[v*8+u] = 1/4 * au * av * sum;
    }
  }
  return output;
}
A naïve DCT implementation in JavaScript

And the corresponding inverse DCT:

fx,y=14u=07u=07α(u)α(v)Fu,vcos[(2x+1)uπ16]cos[(2y+1)vπ16]f_{x,y} = \frac{1}{4} \sum_{u=0}^{7} \sum_{u=0}^{7} \alpha(u) \alpha(v) F_{u,v} \cos \left [ \frac{(2x+1)u\pi}{16} \right ] \cos \left [ \frac{(2y+1)v\pi}{16} \right ]

α(u)={12, if u=01, otherwise \alpha(u) = \begin{cases} \frac{1}{\sqrt{2}}, & \text{ if } u=0\\ 1, & \text{ otherwise } \end{cases}

The DCT-III, also known as the inverse DCT

Also implemented in JavaScript as:

const idct = function(input) {
  let output = [], v, u, x, y, sum, val, au, av;
  for (y=0; y<8; y++) {
    for(x=0; x<8; x++) {
      sum = 0;
      for (v=0; v<8; v++) {
        for(u=0; u<8; u++) {
          au = u === 0 ? 1/Math.SQRT2 : 1;
          av = v === 0 ? 1/Math.SQRT2 : 1;
          val = block[v*8+u];
          val *= au;
          val *= av;
          val *= Math.cos(((2*x+1) * u * Math.PI)/16);
          val *= Math.cos(((2*y+1) * v * Math.PI)/16);
          sum += val;
        }
      }
      output[y*8+x] = 1/4 * sum;
    }
  }
  return output;
}
A naïve iDCT implementation in JavaScript

These JavaScript implementations are naïve implementations, i.e. that are quite computationally expensive and unoptimized. They are, however, relatively easy to reason about.

For reference, there are also several high performance variations and approximations, that reduce the number of mathematical operations required, at the cost of readability. For example:

So, What Does It Do?

I’m going to start by looking at a simpler one-dimensional example.

We will begin with a signal, a linear ramp, like this:

102030405060
The input signal

The input signal has been sampled 8 times, each sample is independent from it’s neighboring samples; i.e. it represents the value of the signal at a specific point in space or time. A DCT transforms these discrete data points into a sum of cosine functions, each oscillating at a different frequencies and at different magnitudes.

Specifically, the DCT will transform this of 8 samples into 8 coefficients. Each coefficient will multiply a specific cosine frequency — altering that magnitude of that function and its corresponding impact on the reconstructed signal.


Let’s look at the formal definition of the forward and inverse one-dimensional DCT that we will be using.

Gk=α(k)2Nn=0N1gncos[πN(n+12)k]G_{k}= \alpha(k) \sqrt{\frac{2}{N}} \sum_{n=0}^{N-1} g_{n} \cos \left [ \frac{\pi}{N} \left (n + \frac{1}{2} \right ) k \right ]

A 1-dimensional DCT

gn=2Nk=0N1α(k)Gkcos[πN(n+12)k]g_{n} = \sqrt{\frac{2}{N}} \sum_{k=0}^{N-1} \alpha(k) G_{k} \cos \left [ \frac{\pi}{N} \left (n + \frac{1}{2} \right ) k \right ]

A 1-dimensional iDCT

Where:

First, we can focus just on the forward DCT transformation. We can translate the forward equation into the following JavaScript:

const dct1d = function(signal) {
  let N = signal.length,
      output = [],
      sum, k, n, s;

  for(k=0; k<N; k++) {
    sum = 0;
    for(n=0; n<N; n++) {
      sum += signal[n] * Math.cos(Math.PI * (n + 0.5) * k / N);
    }
    s = k===0 ? 1/Math.sqrt(2) : 1;
    output[k] = s * Math.sqrt(2/N) * sum;
  }
  return output;
};
A naïve implementation of a forward 1D DCT.

This function will take an array of samples and return an array of equal length DCT coefficients. Now, we can use this function to transform our input signal. Something like:

var coefficients = dct1d([8,16,24,32,40,48,56,64]);

The resulting array will be 8 elements long and will look like this:

-50050100
The computed DCT coefficients

Reconstructing the Signal

Now that we have our set of coefficients, how do we transform it back into the original signal? For that, we use the inverse DCT.

Again, we can express this in JavaScript as:

const idct1d = function(dct) {
  let N = dct.length,
      signal = [],
      sum, k, n, s;

  for(n=0; n<N; n++) {
    sum = 0;
    for(k=0; k<N; k++) {
      s = k===0 ? Math.sqrt(0.5) : 1;
      sum += s * dct[k] * Math.cos(Math.PI * (n+0.5) * k / N);
    }
    signal[n] = Math.sqrt(2/N) * sum;
  }
  return signal;
};
A naïve implementation of a inverse 1D DCT.

Let use it to reconstruct our signal:

const reconstructedSignal = idct1d(coefficients);

Again, this function returns the same number of samples as our coefficients. And aside from some small floating-point rounding errors, the reconstructed signal is identical to the original signal.

102030405060
The reconstructed signal

Okay, but Why?

Up to now, you may have noticed that each transformation has been of equivalent length; e.g., n samples become n coefficients and vice versa. So, how is this actually useful for compression?

Look again at the coefficients of our compressed signal:

-50050100
The computed DCT coefficients

Notice, that the first two coefficients have a relatively large magnitude, while the rest are fairly close to zero. This is because our source signal was a simple ramp: it’s value increased by 8 units at each sample.

As such, most of the energy of the signal can be expressed in the lower frequencies, while the higher frequencies have less of an overall impact on the desired signal. The DCT exploits this property; this is referred to as energy compaction.

If our initial signal was comprised of white noise, i.e. static, there would be less — if any — energy compaction. However many real-world samples, whether aural or visual, the signals tend to be somewhat ordered, and better suited for this type of energy compaction.


We use can quantization to squash our coefficients, which are currently real numbers, into a smaller range of integers. As a simplistic implementation, we can divide each coefficient by 50—the choice of 50 is a completely arbitrary selection on my part, it can be any number really for our purposes—and truncate the result.

const quantizer = (v) => v/50|0;
const quantized = coefficients.map(quantizer);
-1.0-0.50.00.51.01.52.0
The quantized coefficients.

After quantization, we now only have two coefficients that have values, and a long run of values of zero. This set can be run-length encoded much smaller than the original set of samples.

This is fundamentally how the DCT is used for audio and visual compression.

Lossy Compression

If you have a keen eye then you may have noticed something interesting during the quantization step in the last section. We truncated our real values into integers, i.e. we threw away some data.

While that made the data more compressible, what effect does that have on our reconstructed signal? Let’s find out!

First, we need to de-quantize our coefficients:

const dequantizer = v => v*50;
const dequantized = quantized.map(dequantizer);
-50050100
The dequantized coefficients.

Notice they are not the same as the coefficients we calculated before, due to the truncation. Now, let’s run the inverse DCT and see what signal we get back:

const decompressedSignal = idct1d(dequantized);
102030405060
The reconstructed decompressed signal.

At first glance, the reconstructed signal appears similar. However, on closer inspection, you can see they are actually different. That is because we threw away some of the smaller, high-frequency coefficients that were subtly adjusting the reconstructed signal. Without those frequencies, the new signal drifts away from the original.

However, compare them together on the same chart:

102030405060
Both the original signal and reconstructed decompressed signal.

Not quite the same, but close—and that’s the idea. We can use the DCT, to transform our set into a more compressible set data that can be reconstructed into a similar signal, but not identical. We have, however, lost some of the original data in the process; that is what is meant by “lossy” compression.

By adjusting the quantization value (or using a quantization matrix), one can control the balance between compressibility and signal fidelity of the transformation.

Next Up…

Let’s explore deeper into the DCT:

More Reading