How Modern Video Compression Algorithms Actually Work

How Video Compression Works Hero

Modern video compression algorithms aren’t the same as the image compression algorithms you might be familiar with. The additional dimension and time means different mathematical and logical techniques are applied to the video file to reduce the size while maintaining video quality.

In this post we’re using H.264 as the archetypal compression standard. While it’s no longer the newest video compression format, it still provides a sufficiently detailed example for explaining big-picture concepts about video compression.

What Is Video Compression?

Video compression algorithms look for spatial and temporal redundancies. By encoding redundant data a minimum number of times, file size can be reduced. Imagine, for example, a one-minute shot of a character’s face slowly changing expression. It doesn’t make sense to encode the background image for every frame: instead, you can encode it once, then refer back to it until you know the video changes. This interframe prediction encoding is what’s responsible for digital video compression’s unnerving artifacts: parts of an old image moving with incorrect action because something in the encoding has gone haywire.

I-frames, P-frames, and B-frames

How Video Compression Works I P And B Frames.svg

I-frames are fully encoded images. Every I-frame contains all the data it needs to represent an image. P-frames are predicted based on how the image changes from the last I-frame. B-frames are bi-directionally predicted, using data from both the last P-frame and the next I-frame. P frames need only store the visual information that is unique to the P-frame. In the above example, it needs to track how the dots move across the frame, but Pac-Man can stay where he is.

The B-frame looks at the P-frame and the next I-frame and “averages” the motion across those frames. The algorithm has an idea of where the image “starts” (the first I-frame) and where the image “ends” (the second I-frame), and it uses partial data to encode a good guess, leaving out all the redundant static pixels that aren’t necessary to create the image.

Intraframe Encoding (I-frames)

How Video Compression Works I P And B Frames

I-frames are compressed independently, in the same way still images are saved. Because I-frames use no predictive data, the compressed image contains all the data used to display the I-frame. They are still compressed by an image compression algorithm like JPEG. This encoding often takes places in the YCbCr color space, which separates luminosity data from color data, allowing motion and color changes to be encoded separately.

For non-predictive codecs like DV and Motion JPEG, that’s where we stop. Because there are no predictive frames, the only compression that can be achieved is by compressing the image within a single frame. It’s less efficient but produces a higher-quality raw image file.

In codecs that use predictive frames like H.264, I-frames are periodically shown to “refresh” the data stream by setting a new reference frame. The farther apart the I-frames, the smaller the video file can be. However, if I-frames are too far apart, the accuracy of the video’s predictive frames will slowly degrade into unintelligibility. A bandwidth-optimized application would insert I-frames as infrequently as possible without breaking the video stream. For consumers, the frequency of I-frames is often determined indirectly by the “quality” setting in the encoding software. Professional-grade video compression software like ffmpeg allows explicit control.

Also read: What You Need to Know About Video Encoding

Interframe Prediction (P-frames and B-frames)

Video encoders attempt to “predict” change from one frame to the next. The closer their predictions, the more effective the compression algorithm. This is what creates the P-frames and B-frames. The exact amount, frequency, and order of predictive frames, as well as the specific algorithm used to encode and reproduce them, is determined by the specific algorithm you use.

How Video Compression Works Block Partition

Let’s consider how H.264 works, as a generalized example. The frame is divided into sections called macroblocks, typically consisting of 16 x 16 samples. The algorithm does not encode the raw pixel values for each block. Instead, the encoder searches for a similar block in an older frame, called the reference frame. If a valid reference frame is found, the block will be encoded by a mathematical expression called a motion vector, which describes the exact nature of the change from the reference block to the current block. When the video is played back, the video player will interpret those motion vectors correctly to “retranslate” the video. If the block doesn’t change at all, no vector is needed.

Conclusion: Data Compression

Once the data is sorted into its frames, then it’s encoded into a mathematical expression with the transform encoder. H.264 employs a DCT (discrete-cosine transform) to change visual data into mathematical expression (specifically, the sum of cosine functions oscillating at various frequencies.) The chosen compression algorithm determines the transform encoder. Then the data is “rounded” by the quantizer. Finally, the bits are run through a lossless compression algorithm to shrink the file size one more time. This doesn’t change the data: it just organizes it in the most compact form possible. Then, the video is compressed, smaller than before and ready for watching.

Image credit: VC Demo, itu delft

Subscribe to our newsletter!

Our latest tutorials delivered straight to your inbox

Alexander Fox Avatar

Read next

When Sony shipped the first Walkman in 1979, chairman Akio Morita insisted on a second headphone jack and a “hotline” talk button, convinced it would be rude for one person to listen to music alone — and within a few years buyers had ignored the sociable features so completely that Sony quietly dropped them
Russia still custom-builds the Soyuz return seats for ISS crew members using plaster casts taken weeks before launch, because astronauts grow as much as five centimetres taller during a long-duration stay and a seat moulded to their Earth-shaped spine would no longer fit the body that comes home
The “CrackBerry” nickname stuck for a reason — and the variable-reward psychology that hooked early-2000s executives on their BlackBerrys is the exact same machinery now running every push notification on every smartphone in your pocket
In 1843, Ada Lovelace described a brass-and-punched-card engine that could act on symbols as well as numbers, even composing music if harmony could be reduced to rules, inside seven translator’s notes three times longer than the paper itself
ARPANET sent its first message on 29 October 1969 from a lab at UCLA to a machine at Stanford, and the message was supposed to read ‘LOGIN’ — but the system crashed after the L and the O, meaning the first word ever transmitted over the network that became the internet was, by accident, ‘LO’.
In 1995, Microsoft shipped a cartoon-house interface called Bob, led by Melinda French, who married Bill Gates while it was in development — it demanded twice the memory of a typical home PC, sold roughly 30,000 copies, and was dead within a year, leaving behind the font Comic Sans and the animated assistant that became Clippy.
The Greenland shark grows about one centimetre a year, does not reach sexual maturity until around age 150, and a specimen carbon-dated by Danish researchers in 2016 was estimated to be at least 272 years old, meaning it was already swimming the North Atlantic when Mozart was composing symphonies.
When Apple shipped iOS 12 in June 2018, a small feature called Screen Time slipped onto every iPhone with a counter nobody had quite prepared for — a tally of pickups — and within a day Tim Cook was telling CNN the number of times he picked up his own phone was simply too many