The 8 hour journey to a single character
I came into work on day not too long and was met with an unfortunate piece of information. The test we ran the previous night reported a 4% drop in accuracy (we found 4% fewer faces). I frowned. I had clearly screwed something up.
You see, the day before, I found that our code was doing some really pathological and unnecessary color conversions on images. We were using our video i/o library (a very thin wrapper on ffmpeg) to read in movies and allowing it to convert the frames into RGB. As soon as our code got ahold of these RGB frames, we immediately converted them to grayscale. The pathology of the situation is that the vast majority of all video formats are already in the YUV color space (or YCbCr color space. I will use the terms interchangably, but you should know they aren’t quite the same thing). For those among you that aren’t image processing nerds, the Y-channel of YUV is functionally equivalent to a gray channel. In other words, videos are almost always encoded with a gray channel and 2 color (chrominance) channels. That means converting these images to RGB just to convert them back to gray is beyond wasteful.
We had actually known about this problem for awhile but it’s never been a terribly pressing issue. That changed recently when we were doing some really high speed processing and we noticed our color conversions were a non-trivial portion of the time. I dug into our video i/o library, removed the conversion to RGB, and piped the Y channel of the YUV frames into our software. Everything appeared to be working perfectly. Output looks good. No memory leaks. Commit. Go home. Watch House.
The next day…
So I walk in, bright and early, and hear the disappointing news. We lost 4% accuracy. Did my colorspace change cause this? I hope not. Within a half an hour, I realized that my colorspace conversion change clearly was not correct. The frames that I had tested on the previous day looked identical, but they weren’t. I expected some minor round off error as a possibility, but the changes were more than I anticipated.
The error became extremely obvious when I ran the new decoding library on the opening of Star Trek. Conveniently, the opening of Star Trek is pitch black. My “new” library was not producing black! It was near-black, but very clearly NOT black. Uh oh. What’s going on?
Let’s look at the numbers
I dug into ffmpeg and very quickly reproduced the problem. The very first pixel of the very first frame of this episode of Star Trek had a Y value of 16. Not zero. When ffmpeg converted this value to RGB, the result came out to be (0,0,0). Black. Wait a minute, since when is Y=16 equal to black? What is going on?
I looked at the only other place in our codebase where we deal with this type of information: JPEG decoding. JPEG’s also use YUV formatting. What would this particular frame look like in a JPEG’s YUV? This is a quick test. Is it also 16? Nope, it’s zero.
For an RGB value of (0,0,0), ffmpeg is telling me the Y value is 16, and the jpeg library is telling me the value is 0. Either there is an egregious bug in one of the two most well-tested libraries on the planet, or I clearly don’t understand wtf is going on. I’m going to assume the latter. But that will have wait until after lunch.
Research time
When I got back from lunch, it was time to hit up the internet. If you search the internet for formulas to convert yuv to rgb you’ll get all sorts of conflicting information. You’ll even get very different formulas. If you read the YUV or YCbCr Wikipedia pages, you can easily miss the most important information (hint: it’s not the equations). After spending a tremendous amount of time reading (and being confused about the different formulas), I made the critical discovery. There are different definitions of YUV.
I then had to dig into the details to find out just how deep this particular rabbit hole goes. In the end, it wasn’t terribly complicated (but it was difficult to find good information). In essence, though, different standards define different dynamic ranges for the YUV color space when digitized into 8-bit per sample. Movie files (e.g, MPEG) will often use 16-235 for the Y channel (black->white), while images (JPEG) will use 0-255. A movie file’s white (235) != a jpeg file’s white (255). To make matters worse, the Cr and Cb (ie, U and V) channels use an entirely different set of dynamic ranges for MPEG files (though jpeg is always [0-255]). Oh my.
Note: if you are here because you are having similar yuv/rgb problems and google led you here, I strongly suggest you read every single word of these three links:
The Fix, part 1
If all I need to do is rescale to a different dynamic range, that is not a difficult problem to solve. It’s a fair bit tricky (watch those unsigned overflows!) but it’s nothing that can’t be accomplished through the power of C. I spent an hour or so writing a function to convert the three channels to the expanded dynamic range (remembering that the Y channel uses a different range than the U and V channels). I knew I’d lose some information, but what choice did I have?
Once I had finished, I ran all my previous tests and found the output to be far, far better than the one I was using from the previous day. I also tested my new conversion routine on images that failed from the overnight test and what do you know, they were now finding faces. Mission accomplished!
Not so fast my friend
It was about this time that I felt the need to vent. Seriously, movie and jpeg people, why are you doing this to me? Why are there two (note: actually more than two) different dynamic ranges for 8-bit YUV pixels? Why oh why? (more notes: if you want to learn why, it’s actually a fairly fun and interesting story… taketh thee to wikipedia).
In the need for some complaining, I decided to go onto IRC and complain to the only video developer I know (he works on x264 — the open source h264 encoder). I asked him why “they all” go around screwing with people like me with such nonsense. He laughed and went on to explain there are actually more than two different formats and commiserated with me for a moment. And then he said something important. He said only “--fullrange“. Wait. What is --fullrange? Is that an x264 parameter? Yes, yes it is. What does --fullrange do? It uses the fullrange of YUV. Ah! x264 devs are genius. Why would they leave this silly conversion to us?
Oh wait. Does that mean… does ffmpeg… do it too? It has to, right? Let’s check the docs, shall we. There sure are alot of formats on this list. I wonder if any of them are “full-range” YUV.
PIX_FMT_YUVJ420P Planar YUV 4:2:0, 12bpp, full scale (jpeg).
PIX_FMT_YUVJ422P Planar YUV 4:2:2, 16bpp, full scale (jpeg).
PIX_FMT_YUVJ444P Planar YUV 4:4:4, 24bpp, full scale (jpeg).
Does that “jpeg” mean these are “jpeg-style” full-range YUV outputs? I should try this. Within minutes I realized that yes, these formats outputted YUV channels that used the full dynamic range 0-255. Excellent. I reverted all my ugly changes with my own customized range expansion code and committed this final fix.
- PIX_FMT_YUV420P, + PIX_FMT_YUVJ420P,
One character. One friggin’ “J”. 8 hours. I hope no one is keeping track of “lines of code per hour”.
November 23rd, 2009 at 9:06 am
i work in an APL derivative. my best days are the ones with negative LOCs.
November 23rd, 2009 at 10:20 am
That is a great story. Early in my career, I once spent about 36 hours chasing a bug that turned out to be a single misplaced closing brace, but this is better (for some value of “better”).
November 23rd, 2009 at 11:49 am
Great story. Reminds me of a time we brought down the site when we pushed out a / that was supposed to be *
November 23rd, 2009 at 12:20 pm
In college I once spent over 2 hours on:
if(something false);
{
}
CONVINCED that the compiler had a bug. Note: the compiler never has a “bug”,
although I have triggered asserts in javac (with code that violated a trciky part of the spec)
and I have code that can crash msvc’s compiler =).
November 23rd, 2009 at 1:54 pm
wow, you are an idiot. i feel bad for you. you work on image processing for a living?
November 23rd, 2009 at 1:56 pm
heh, everyone knows that the J comes between the V and the 4.
November 23rd, 2009 at 1:57 pm
A poignant illustration of the chaotic nature of computer programming.
November 23rd, 2009 at 4:02 pm
I recently learned that removing list of new products from oscommerce start page and replacing it with list of all categories, requires replacing ” with ‘0′ in one place in includes/application_top.php
November 23rd, 2009 at 5:40 pm
Been there, done that. For some reason IE wasn’t rendering the page I wanted right, the IE hacks I had used before were failing being on a deadline I did something bad (alternate ugly stylesheet)
eventually I worked out that IE was rendering this page (but no other) in quirks mode. Doctype is there and correct, the html is a well-formed XML tree, what could be wrong?
Turns out the page is saved as UTF +BOM and the others as UTF -BOM and for some reason the web-server is emitting the BOM at the start of the file, IE is seeing something before the doctype and going into quirks mode. When I look at the source it seems clear (due to the BOM being invisible)
Remove the ugly alternate stylesheet, fix one line of code push changes into source control.
November 23rd, 2009 at 6:27 pm
If you come from a straight computing background, you’ll mostly be exposed to full-scale colour channels, and probably think everything uses 24-bit RGB with 0-255. But in the professional film/video industry, there’s a ton of different standards, formats and colour spaces, and the rescaled channels are very common. Also, 10-bit per channel video is normal these days. There are also very good reasons for doing things this way – the 16-235 scale wasn’t put there to sneak a bug into your application. It’s to allow headroom and toeroom during post-processing. It is worth understanding the history and rationale behind something before dismissing it as “nonsense” – there’s usually a pretty good reason.
The moral of this story: always keep your colour space in mind. Even if you’re looking at RGB, is it Rec.601 or 709? Gamma corrected or linear? And you *cannot* use the terms YUV and YCbCr interchangeably and still be accurate – they are NOT the same. Please be precise with your terms, or you merely propagate the confusion.
November 23rd, 2009 at 6:37 pm
I’ve had so many experiences like this. Rarely spanning multiple days though.
Patience is the most important programming skill.
November 23rd, 2009 at 6:53 pm
Single character bugs are the best ones. Without going into much detail, some of the single characters that have caused me the most hours of work are:
!
=
*
November 23rd, 2009 at 6:54 pm
Ugh; last post should have included:
<
>
November 23rd, 2009 at 7:38 pm
I once chased a strange floating point error in a calculation routine. When I traced it in CPU-view, I found out that the compiler assumed the ‘direction flag’ was always 0, which it wasn’t in this case. An __emit__(“CLD”) in the right place fixed the error.
November 23rd, 2009 at 10:03 pm
Those one character bugs are the best ones. Honestly, I love the feeling of tracking what feels like a huge problem down to one little char. Once you get it fixed, it’s FIXED and it’s satisfying.
November 24th, 2009 at 3:35 am
I once managed to insert a non-printing character into a perl script by hitting Shift+Space (invoking SCIM) instead of Space. Took a long time to find the problem.
November 24th, 2009 at 9:03 am
Thanks for sharing! Felt like an adventure reading this, although I had a vague recollection of where the tale was leading since I dipped my toe into the video conversion world in the mid 1990s just long enough to experience some of your frustration with the numerous formats. Thing are hopefully/probably better now, but then it felt like a wasteland of incompatible, competing formats.
Oh, and speaking of all-day simple bugs. In the early 90s, I once spent an entire day on a single line of code in a 3D rendering engine. The code seemed to change its output depending on what called it. Turns out it was worse than that — output depended on whether it was before or after a comment! Senior developers thought I was nuts so I reduced the test case to a 5-line program. So this:
void function()
{
i=i+i;//comment
}
produced different output from this
void function()
{
//comment
i=i+i;
}
Yeah, no kidding, that was weird. Turns out it was a bug in the compiler (we were using Microsoft’s shiny new C++ compiler, still in Beta) causing i=i+i to produce unpredictable results. Filed a bug report on the phone (this is pre-WWW, Windows 2.1 days). Good times.
November 24th, 2009 at 9:54 am
It seems to me that you have now regressed to performing a (albeit simpler) colorspace transform inside ffmpeg, which I thought it was your original goal to avoid . . . admittedly it should be done by nice fast code, but it still involves an unnecessary full frame duplication, which could be a pain if you care about the cache footprint.
November 24th, 2009 at 10:18 am
It happened in my life too. really nice post. thanks 4 sharing it.
November 24th, 2009 at 11:13 am
Interesting story, I once spent about a day and a half trying to make a custom jabber client work, only to find out that one character that was supposed to be capitalized was lower case, it took so long as I kept looking at what appeared to be identical logs – but one worked and one didn’t. So I definitely feel your pain.
November 24th, 2009 at 12:53 pm
I once spent a day looking for what turned out to be the difference between “K” and “k”.
I also once helped two friends who had been debugging a web server configuration for several hours. They called me over, I looked at a few of the config files (this was httpd 1.3 or 1.4) and then said — uh, could you put a ENTER right *there*? They went “huh, why???”, and I said, “just do it please…” and everything worked.
It was the very last line of the file, and their parser required a newline (or maybe CR, I forget) to be at the end of the line in order to recognize the line should be processed. Doh!
November 27th, 2009 at 8:07 am
[...]one another relavant source of information on this topicis ,lbrandy.com,[...]
February 4th, 2010 at 2:31 am
The journey is very nice for single charter. Example (j = 1; j<20; j+++)that is single charter but j is very imp for 20 number print.