Technical Notes On DVD-Video

by Billy Biggs

I got a dxr2+drive in Christmas 1999, but I didn't start seriously hacking with DVDs until about March 2001. I first attempted a recompressor, and when that was going poorly, I began to convert my code into a player. I learned alot from #livid on freenode.net. If you want to talk about DVDs, I'm likely there.

My graphics page is at www.scanline.ca and my home page is at www.billybiggs.com.

I gave a talk on the linux DVD development scene, as well as a brief introduction to DVD-Video. The slides for the talk are available online.

DVD subtitles

I finally added subtitle compositing to movietime and learned a few things. For more information on DVD subtitles, visit sam's subtitle page.

A brief overview of DVD subtitles

Subpictures are encoded as 4 colour images, each colour having a 4 bit alpha value. The 4 colours are chosen from a 16 colour palette of Y'CbCr values. This does not leave much room for artistic design. Subpictures are primarily used for menu buttons and subtitles.

A typical subpicture allocates colours similar to this example from Ghostbusters (1984) (the one at the bottom):

ID Description Alpha Y' Cb Cr R'G'B' equivalent
1 Transparent 0 204 127 128 (transparent)
2 Foreground 15 160 44 141
3 Outline 15 35 128 128
4 Not Used 15 76 92 133

Some example subtitles from the Canadian DVD release of Lola Rennt (1998) (Columbia Tristar Home Video).
These subtitles make no use of alpha, and use only three of the colours (transparent, opaque black, and opaque ugly). No attempt was made to antialias. These look alot better on a TV, and I'm not sure why yet (is my subtitle decoding code busted?).
DVD subtitles can be presented with different pixel aspect ratios. An annoying example is in Run Lola Run, where we have anamorphic video but no anamorphic subtitles. For me to display them correctly, I'd either have to do software scaling of the DVD video, poor downscaling of the subtitles, or have the hardware do subtitle compositing. Awful.
For reference, the palette used here is transparent black (16, 128, 128), opaque grey (unused: 110, 128, 128), opaque ugly (171, 38, 143), and opaque black (16, 128, 128).
Something about these images says to me that we're not doing something right here. They look much better when I play this DVD on a PS2.

Here are some more decoded subtitles, these from the Canadian Alliance Atlantis release of Mononoke Hime (1997).
Again, no alpha use at all, and only 3 colours. Oh, and it's my fault that the subtitles extend one pixel too far to the right.

These subpictures from Ghostbusters (1984) have me confused. The DVD uses the subtitle code to have an MST3k-style overlay for their commentary track. What confuses me is the use of alpha.
The top image is a composited image, assuming the alpha has not been premultiplied and assuming a display gama of 2.2. The bottom image is the raw colours from the palette, with fully transparent displayed as pure white.
The grey outline in the bottom picture is the actual colour in the palette. Every instance of grey in the image has an alpha value of 1 out of 15. So, when compositing onto white, we have a 6% grey on 96% white:
Palette Colour Composite -------------------------- R' = 218.907 R' = 253 G' = 219.299 G' = 253 B' = 216.890 B' = 253
This got me worried that maybe we should be treating the colour values as premultiplied. A long discussion with bjorn and hh has convinced me this is not the case. They cite examples where the alpha value changes to fade in an image, which would clearly go against my assumption. However, hh has suspicions that maybe the alpha value given is not linear.

Alpha Blended

Raw Palette

Here is an example of a very clean subtitle from Ghostbusters (1984) which convinced me that my decoding was likely correct. Contrast this to the noise in the subtitle from Run Lola Run.
However, they should probably learn what kerning is. Look at the 'f' and 'ee' in 'feelings'.

Gamma-correct DVD subtitle compositing

As far as I can tell, doing a gamma-correct composite over Y'CbCr images is difficult. Y'CbCr is a nonlinear colour space, that is, it's a linear transform from R'G'B'. One way to do a correct composite between two colours in Y'CbCr would be to:

Convert each colour to R'G'B' using the usual transform.
Convert each R'G'B' colour to RGB using the intended display gamma value.
Interpolate the colours in RGB space.
Convert back to R'G'B' using the inverse of the display gamma.
Convert back to Y'CbCr.

This process is made difficult because of how the chroma is downsampled. Since we're trying to be all correct, we have to upsample the chroma from the MPEG2 video from 4:2:0 to 4:4:4 (realigning the chroma samples with the luma samples!), then do all the hard work, convert back, and then downsample the chroma back to 4:2:0. Ouch!

It's still unclear to me if I can't just apply a new gamma function to the Y'CbCr channels independantly. I don't want to resample the chroma, and I can get around that by acting on each channel independantly. So the question is, how do we map the gamma function directly to Y'CbCr space? Can we do this at all? This issue is still completely unclear to me. I'll have to play around in maple/matlab more.

Deinterlacing DVDs

Deinterlacing and 3:2 pulldown inversion are important for playback of DVDs on progressive-scan displays like computer monitors.

A scary issue is mentioned in MPEG document 2820 which indicates that the 'progressive_frame' flag in the MPEG2 header is unreliable for deinterlacing purposes! This moves intelligent deinterlacing almost completely into the image heuristics area, except for reversing 3:2 pulldown when performed using the repeat_first_field flag. Note below that even then we can have problems, see under 'weird 3:2 pulldown encoding'!!

An example of 3:2 pulldown encoding

The following sequence is taken from the NTSC release of Lawrence of Arabia, Title 1, Chapter 15. It is the first 11 frames.

The first 5 frames are marked as interlaced (thanks!) and so the coded framerate is 29.97fps, but the material is clearly from 24fps source with 3:2 pulldown applied. The DVD then switches into progressive mode, and uses the repeat_first_field flag to offload the pulldown work onto the player. This switches the effective coded framerate down to 23.976fps.

In this DVD, small bits of scenes have been encoded at 29.97fps instead of always coding at 23.976fps. Why would they do this? One thought is that maybe certain scenes were touched up at video speed to remove objectionable artifacts in the pulldown conversion, but that doesn't seem to be the case here. I did notice that often we see some interlaced frames near the beginning of chapters. Maybe they fear some DVD players need time to switch into 24fps mode?

Frame	MPEG Type	Field Pattern (from tff/rff flags)	Progressive Frame	Image
0	I	Top Bot	Interlaced	[lawrence00.png]
1	B	Top Bot	Interlaced	[lawrence01.png]
2	B	Top Bot	Interlaced	[lawrence02.png]
3	P	Top Bot	Interlaced	[lawrence03.png]
4	B	Top Bot	Interlaced	[lawrence04.png]
5	B	Top Bot Top	Progressive	[lawrence05.png]
6	P	Bot Top	Progressive	[lawrence06.png]
7	B	Bot Top Bot	Progressive	[lawrence07.png]
8	B	Top Bot	Progressive	[lawrence08.png]
9	I	Top Bot Top	Progressive	[lawrence09.png]
10	B	Bot Top	Progressive	[lawrence10.png]

A really weird 3:2 pulldown encoding!!

This came as a complete shocker to me. Here is some output of the first 150-or-so frames of The Good, The Bad, and the Ugly (1966). Take a careful look at all the non-repeat_first_field frames! They're interlaced! Not only that, but the progressive_frame flag is high the whole time!

The conclusion here is that a correct deinterlacer must look at _every_ non-repeat_first_field frame, even if we're clearly in a pulldown sequence! What a mess!

My biggest question here is why? Why would they ever do this? One observation we make is that every second frame is a blend of the two beside it. So, maybe the only print they found of the opening credits was at 12fps? Maybe it was originally recorded at 12fps and this is a conversion technique? Maybe the quality was so bad, they only decided to restore every second frame? If you have thoughts, please email them to me.

`60`	`I Top Bot Top`	`Progressive`	[image]
`61`	`B Bot Top`	`Progressive`	[image]
`62`	`B Bot Top Bot`	`Progressive`	[image]
`63`	`P Top Bot`	`Progressive`	[image]
`64`	`B Top Bot Top`	`Progressive`	[image]
`65`	`B Bot Top`	`Progressive`	[image]
`66`	`P Bot Top Bot`	`Progressive`	[image]
`67`	`B Top Bot`	`Progressive`	[image]
`68`	`B Top Bot Top`	`Progressive`	[image]
`69`	`P Bot Top`	`Progressive`	[image]
`70`	`B Bot Top Bot`	`Progressive`	[image]
`71`	`B Top Bot`	`Progressive`	[image]
`72`	`I Top Bot Top`	`Progressive`	[image]
`73`	`B Bot Top`	`Progressive`	[image]
`74`	`B Bot Top Bot`	`Progressive`	[image]
`75`	`P Top Bot`	`Progressive`	[image]
`76`	`B Top Bot Top`	`Progressive`	[image]
`77`	`B Bot Top`	`Progressive`	[image]
`78`	`P Bot Top Bot`	`Progressive`	[image]
`79`	`B Top Bot`	`Progressive`	[image]
`80`	`B Top Bot Top`	`Progressive`	[image]

Realtime deinterlacing results on the bjork sequence

This video sequence is taken from the Bjork: Volumen DVD of music videos. It was filmed at video speed of 59.94fps. We compare images from:

Showing both fields together as a frame
The top field stretched vertically using simple averaging
ffmpeg's [-1 4 2 4 -1] deinterlacing filter
My threshold-based 'motion adaptive' deinterlacing algorithm.

I want to try out VirtualDub's adaptive deinterlacing algorithm, which Jarl gcc'ified in drip, but it only acts on 24-bit R'G'B' images so it will take a while to adapt it to my code. I also intend to port some of the dscaler algorithms, but I haven't got around to it.

I was surprised at how reasonable ffmpeg's filter is. I find the results from it quite reasonable. I tried it out with my motion test, but the results weren't so good. I'll try and get some shots of different algorithms up, even bad ones, just for comparison. That said, it's still an interpolation filter, and so the results aren't all that great.

The shots below are for when we are deinterlacing the top field, that is, we want to get the highest quality image corresponding to the point in time of the top field. These frames are all top-field-first, so, we're deinterlacing using pixels from the future. None of the algorithms currently shown use more than 2 fields to deinterlace, but of course, I will add more shortly.

A static frame of anti-aliased text. This is an excellent test of static frame quality. Simple interpolation leads to nasty artifacts, while ffmpeg's filter looks very good. My motion test algorithm pukes all over this image since the antialiasing messes my ability to predict intensity values.
The opening shots in this music video are dreamy with random fields being blurred with adjacent frames to make the sequence sort of float along. Notice that the top field is a blur between it and the bottom field, while the bottom field just contains itself.
Similar to the above.
Watch closely the smoothness of the diagonal line. This shows how bad a simple vertical average can be.


We're still in the dream sequence. Watch the quality of the objects on the wall.

Break into normal video, lots of fast action. MPEG artifacts galore, but we still have to deinterlace! Check the tiles on the floor for some diagonal lines to compare.

Dealing with framerate conversions

I have posted a sequence of frames and discussion about the 625/50 (PAL) release of the first season of The Simpsons on DVD.

The page shows the raw fields, not frames, from the DVD. Clearly we are seeing the result of some framerate conversion, and I think it's pretty crazy. Every n fields (I haven't determined n yet) we see a field which is just a half/half blend of the two adjacent fields. The result is a nasty blurry frame. Ugh!

Not only is it ugly, but for someone trying to deinterlace, this is going to make any algorithm suck. I'll post results up soon of what artifacts this leads to when you look closely.

Every top field is a blur of the two adjacent fields, and then after a while we see three fields each of which is not a blur, and now our phase has changed, so every bottom field is a blur. Doing out the math, I'm pretty sure they converted 24fps to 50fps by approximating each refresh of 50hz with the closest two 24fps frames, and if two are present during that refresh show the blur.

See this page for more images with discussion.

Field parity	Image with link to full resolution png

top
bot
top
bot
top
bot
top
bot
top
bot
top
bot
top
bot
top
bot
top
bot
top
bot

Using MPEG's motion vectors for deinterlacing

This idea keeps coming up. Sometimes I think it will work wonders, and other times I think the idea will hurt us really bad when motion vectors don't acurately represent the motion of objects in the scene: Consider a cut between two cameras shooting the same scene at a different angle. The other problem is that we only have motion vector information for predictive frames. Ugh.

Hopefully sometime soon I'll hack libmpeg2 to export the motion vector information and I can try to build a deinterlaced image on the fly. But since I could see this as being alot of work, I'll wait until my generic motion-vector based deinterlacer is written.

Buying DVDs

I like to browse and buy DVDs at local Future Shop when they are cheap. Over the net, I have found CNL cheap and Canadian, dvdplanet is good for U.S. releases and titles which CNL does not have, and I intend to order some PAL DVDs from BlackStar but I have not made an order yet (waiting to see if the PAL release of Transformers: The Movie is actually a widescreen transfer).

My DVD collection is sometimes updated.

Changelog

Mon Feb  3 12:41:38 EST 2003    Updated from openprojects to freenode
Wed Nov 14 14:06:24 PST 2001    Added some more text on the Simpsons sequence
Mon Nov 12 19:14:52 EST 2001    Added the Good/Bad/Ugly sequence results
Mon Nov  5 02:51:21 EST 2001    Added the Simpsons sequence page
Sun Nov  4 15:07:16 EST 2001    Added the Bjork deinterlacing results

Name	Last modified	Size

Parent Directory		-
bjork/	2004-01-24 11:31	-
frame-ycbcrtorgb	2002-04-11 00:31	7.7K
frame-ycbcrtorgb.c	2002-04-11 00:31	6.3K
gb-alpha.png	2002-04-11 00:31	1.1K
gb-noalpha.png	2002-04-11 00:31	1.7K
gb00.png	2002-04-11 00:31	3.0K
gbu/	2002-04-11 00:59	-
lawrence00.png	2002-04-11 00:29	372K
lawrence01.png	2002-04-11 00:29	386K
lawrence02.png	2002-04-11 00:29	384K
lawrence03.png	2002-04-11 00:30	376K
lawrence04.png	2002-04-11 00:30	367K
lawrence05.png	2002-04-11 00:30	367K
lawrence06.png	2002-04-11 00:30	369K
lawrence07.png	2002-04-11 00:30	361K
lawrence08.png	2002-04-11 00:30	361K
lawrence09.png	2002-04-11 00:31	364K
lawrence10.png	2002-04-11 00:31	354K
mononoke00.png	2002-04-11 00:29	744
mononoke01.png	2002-04-11 00:29	944
mononoke02.png	2002-04-11 00:29	1.3K
mononoke03.png	2002-04-11 00:29	2.2K
mononoke04.png	2002-04-11 00:29	1.7K
mononoke05.png	2002-04-11 00:29	3.0K
mpeg2vidcodec_v12.ta..>	2003-04-14 12:26	254K
on-pal-deinterlacing..>	2002-04-11 00:59	3.9K
runlolarun00.png	2002-04-11 00:29	3.0K
runlolarun01.png	2002-04-11 00:29	3.6K
runlolarun02.png	2002-04-11 00:29	2.7K
runlolarun03.png	2002-04-11 00:29	1.9K
simpsons.html	2002-04-11 00:56	8.3K
simpsons/	2002-04-11 00:56	-
subtitle-ycbcrtorgb	2002-04-11 00:29	7.4K
subtitle-ycbcrtorgb.c	2002-04-11 00:29	4.5K
w2820-progressive-im..>	2002-04-11 00:31	24K