# Exploring the Discrete Cosine Transform

The DCT is used extensively in lossy audio and image compression. It can be found in MP3 audio compression, JPEG image compression, and MPEG-1/2 video compression. Let’s take a closer look the fundamentals of how a DCT is practically used.

## Tl;Dr

A DCT expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies.

For JPEG and MPEG video, here is the formal definition of an 8×8 DCT:

Where:

- $\alpha(u) = \begin{cases} \frac{1}{\sqrt{2}}, & \text{ if } u=0\\ 1, & \text{ otherwise } \end{cases}$

Implemented in JavaScript as:

And the corresponding *inverse* DCT:

Also implemented in JavaScript as:

These JavaScript implementations are naïve implementations, i.e. that are quite computationally expensive and unoptimized. They *are*, however, relatively easy to reason about.

For reference, there are also several high performance variations and approximations, that reduce the number of mathematical operations required, at the cost of readability. For example:

- The LLM DCT: Practical fast 1-D DCT algorithms with 11 multiplications
- The AAN DCT: A Fast DCT-SQ Scheme for Images

## So, What Does It Do?

I’m going to start by looking at a simpler one-dimensional example.

We will begin with a signal, a linear ramp, like this:

The input signal has been sampled 8 times, each sample is *independent* from it’s neighboring samples; i.e. it represents the value of the signal at a specific point in space or time. A DCT transforms these discrete data points into a sum of cosine functions, each oscillating at a *different frequencies* and at *different magnitudes*.

Specifically, the DCT will transform this of *8 samples* into *8 coefficients*. Each coefficient will multiply a specific cosine frequency — altering that magnitude of that function and its corresponding impact on the reconstructed signal.

Let’s look at the formal definition of the forward and inverse one-dimensional DCT that we will be using.

Where:

- $\alpha(x) = \begin{cases} \frac{1}{\sqrt {2} }, & \text{ if } x=0\\ 1, & \text{ otherwise } \end{cases}$
- $g$ is the input
- $G$ is the DCT output
- $N$ is the number of samples being transformed

First, we can focus just on the forward DCT transformation. We can translate the forward equation into the following JavaScript:

This function will take an array of samples and return an array of *equal length* DCT coefficients. Now, we can use this function to transform our input signal. Something like:

The resulting array will be 8 elements long and will look like this:

## Reconstructing the Signal

Now that we have our set of coefficients, how do we transform it back into the original signal? For that, we use the *inverse* DCT.

Again, we can express this in JavaScript as:

Let use it to reconstruct our signal:

Again, this function returns the same number of samples as our coefficients. And aside from some small floating-point rounding errors, the reconstructed signal is identical to the original signal.

## Okay, but *Why?*

Up to now, you may have noticed that each transformation has been of equivalent length; e.g., *n* samples become *n* coefficients and vice versa. So, how is this actually useful for compression?

Look again at the coefficients of our compressed signal:

Notice, that the *first two* coefficients have a relatively large magnitude, while the rest are fairly close to zero. This is because our source signal was a simple ramp: it’s value increased by `8`

units at each sample.

As such, most of the *energy* of the signal can be expressed in the lower frequencies, while the higher frequencies have *less* of an overall impact on the desired signal. The DCT exploits this property; this is referred to as *energy compaction*.

If our initial signal was comprised of white noise, i.e. static, there would be less — if any — energy compaction. However many real-world samples, whether aural or visual, the signals tend to be somewhat ordered, and better suited for this type of energy compaction.

We use can quantization to squash our coefficients, which are currently real numbers, into a smaller range of integers. As a simplistic implementation, we can divide each coefficient by `50`

—the choice of `50`

is a completely arbitrary selection on my part, it can be any number really for our purposes—and truncate the result.

After quantization, we now only have two coefficients that have values, and a long run of values of zero. This set can be run-length encoded much smaller than the original set of samples.

This is *fundamentally* how the DCT is used for audio and visual compression.

## Lossy Compression

If you have a keen eye then you may have noticed something interesting during the quantization step in the last section. We *truncated* our real values into integers, i.e. we threw away some data.

While that made the data more compressible, what effect does that have on our reconstructed signal? Let’s find out!

First, we need to de-quantize our coefficients:

Notice they are *not* the same as the coefficients we calculated before, due to the truncation. Now, let’s run the inverse DCT and see what signal we get back:

At first glance, the reconstructed signal appears *similar*. However, on closer inspection, you can see they are actually different. That is because we threw away some of the smaller, high-frequency coefficients that were subtly adjusting the reconstructed signal. Without those frequencies, the new signal drifts away from the original.

However, compare them together on the same chart:

Not quite the same, but close—and that’s the idea. We can use the DCT, to transform our set into a more *compressible* set data that can be reconstructed into a *similar* signal, but not identical. We have, however, *lost* some of the original data in the process; that is what is meant by “lossy” compression.

By adjusting the quantization value (or using a quantization matrix), one can control the balance between compressibility and signal fidelity of the transformation.

## Next Up…

Let’s explore deeper into the DCT:

- Visualization: Behold the waveforms of the DCT and how the signal is regenerated.
- Take it to the Next Dimension: Look at some 2D versions of the DCT.
- Indiana Jones: Explore some visual implications/artifacts of compression.