Me and my slides
I am cursed with either high expectations or awful slide making skills.

This graph is actually only partially correct (I don’t have until October to get it right). It leaves out the cycles where I add 10 slides in a half hour, then spend the next 3 hours rewording and reworking them until only 1 and a half are left. And then I repeat that cycle a few more times.
My New Year’s Resolution
I am a bit of a procrastinator.
Yes, yes, that doesn’t exactly make me special, but I’m talking specifics here. I’m a programmer, and a procrastinator. And that can mean troubles.
For the next year (and/or the first three weeks of January, whichever comes first), I’m going to automate/refactor/abstract any operation I do for the third time. I’ve noticed far too many insignificant hiccups in my workflow that go ignored. It’s just that the marginal cost of putting up with the annoyance is so much less painful than sitting down and writing the script (or a special commands, or option, or what have you). Maybe I’ll do it next week. And next week becomes next month. And now I’ve started to realize niggling annoyances that have gone on for years. Yes, years.
Let me give you two examples…
Oprofile
I use oprofile constantly.
sudo opcontrol --reset sudo opcontrol --start ./run_my_code sudo opcontrol --shutdown opreport -lt1
I’ve typed that out, in full, for probably close to 3 years now. Why? Other people have noticed. I’ve noticed. I’m pretty sure no less than two of my co-workers would have edited my aliases for me, if I had let them. Well, I finally sat down and wrote the 10 line python script that will do this (and a handful of other useful oprofile functions) for me. It took about 10 minutes. Sigh.
Emacs vs vi
I’ve always been an emacs fan. I probably always be. However, there is one part of vi that has always made me jealous. The name. vi is just such an amazingly easy name to type instead of the incredibly laborious emacs. As a young kid, I decided to fix this and I added an alias for my customized emacs launch as “em“. Yes, I aliased emacs to em. I’ve been using it for years. And it’s extremely stupid. Extremely, extremely stupid. It doesn’t take much thought to figure out why “em important_file.cpp” can become a disaster. (extra stupid note: compare the locations of the typo keys).
I know it’s stupid, and yet it continues. Every time I move my environment over to a new machine, I copy the alias over. Why? Because it’s always been there, and it’s on all my machines. Because I can’t fix it right now. I’ll fix it next time. Gah. It ends now. I will never type em again.
Inspiration vs Distraction
When the bug of inspiration bites, what do you do? Do you drop the old project and go for the new shiny one? Or do you ignore the urge?
I am one those of people who is constantly switching between my personal projects. I will work tirelessly on project A, and then on a certain Tuesday when the wind is blowing just right, I’ll suddenly lose all desire for project A and find a sudden and insatiable urge to work on project B. When inspiration strikes, it tends to be very difficult for me to ignore. My hard drive is consequently littered with half-starts, rough outlines, notes, and various ideas, bad and good, in various stages of completion.
This brings up the interesting question as to what the preferred state of affairs is. Should I be more disciplined and learned to ignore the grass-is-always-greener syndromes, or should I go with my own fickle whims? Am I perpetually distracted by these flirtations of inspiration, or is this, in some way, healthy? Given that I almost always chose to drop the old clunker and work on the new thing, you could certainly read into my character quite a bit, if you wanted. Maybe I am too fickle. Maybe I am easily distracted. Maybe I’ll never actually accomplish any of these things. Maybe. I see it another way.
Most projects get exactly one burst of effort before permanent abandonment. Some problems though, I find myself coming back to time and again. I view this particular mechanism as healthy. It’s a natural selection of my ideas. May the fittest survive. This is why I don’t really feel guilty for dropping yet-another half-started project and starting something new. If the idea survives a few months off, it’s probably worth keeping. If it doesn’t, so be it.
Embrace the distraction
So in that vein, I’ve put on hold several slightly boring personal projects for a shiny new one. I’m writing a game. Oh this will end well. I’ve spent the last three weekends tearing apart the UDK to learn how it works and over that time I’ve put together this spectacularly early prototype:
So here’s to another distraction.
Unplanned planning
Here are some rectangles:
struct irect {int top; int bot; int left; int right;};
struct frect {float top; float bot; float left; float right;};
Now, given those rectangles, you are asked to write a conversion from irect to frect.
struct frect convert (struct irect i) {
struct ans = {(float) i.top, (float) i.bot, (float) i.left, (float) i.right)};
return ans;
}
This might at first seem like a reasonable answer but… it’s almost certainly not.
The problem
What is the area of a rectangle with corners (0.0,0.0) and (10.0,10.0)?
Now, how many pixels are in an image with bounds (0,0) to (10,10)?
Hopefully, your answers disagreed (or at least you can see how they could disagree). In our field, we deal with images. Lots and lots of images. One of the most fundamental data structures you can possibly imagine when dealing with images is the lowly rectangle. For us, a rectangle is a region of an image. And it’s this role, as a region, that creates all kinds of pain and heartache.
We can more clearly isolate the problem by asking what is the lower-right hand corner of the above integer rectangle? Well, it depends on exactly what we mean by the lower-right hand corner. If you want to know where to look in memory for the lower-right hand corner pixel information, the answer is (x=10,y=10). If you wanted to know the Cartesian coordinate of the true lower right-hand corner, the answer would be (x=11, y=11).
The solution?
The difficulty here is the conflation of two separate ideas into the same form. First we have the integerized “point” on the cartesian plane, and then we have the 1×1 region known as a “pixel” and specified by its index. How should pixel indices convert to the cartesian plane? Put another way, should an integer rectangle (0,0)x(10×10) have an area of 100 or 121? This is, to some extent, a question of convention. If you pick one, and are consistent, everything will work out fine. However, almost certainly, for your application, one convention will result in cleaner code.
So the answer to the original question becomes a fairly complicated ordeal. This one problem only scratches the surface of the painstaking care that must be taken when dealing with integer and floating point regions of pixels. You would need to figure out exactly how and why you use rectangles and what index convention you want to use. For every function that you write operating on rectangles, you need to carefully consider the implications of your chosen convention. And depending on the convention you use, the rest of your code is going to look very different.
Writing a rectangle class seems straightforward. Yet, all planning in the world won’t prepare you because it’s almost impossible to anticipate this design issue. It’s the simple act of trying to use one of your converted rectangles that makes the problem obvious. Almost paradoxically, this problem, once realized, can only really be solved by meticulous and careful planning. The devil is always in the details.
The 8 hour journey to a single character
I came into work on day not too long and was met with an unfortunate piece of information. The test we ran the previous night reported a 4% drop in accuracy (we found 4% fewer faces). I frowned. I had clearly screwed something up.
You see, the day before, I found that our code was doing some really pathological and unnecessary color conversions on images. We were using our video i/o library (a very thin wrapper on ffmpeg) to read in movies and allowing it to convert the frames into RGB. As soon as our code got ahold of these RGB frames, we immediately converted them to grayscale. The pathology of the situation is that the vast majority of all video formats are already in the YUV color space (or YCbCr color space. I will use the terms interchangably, but you should know they aren’t quite the same thing). For those among you that aren’t image processing nerds, the Y-channel of YUV is functionally equivalent to a gray channel. In other words, videos are almost always encoded with a gray channel and 2 color (chrominance) channels. That means converting these images to RGB just to convert them back to gray is beyond wasteful.
We had actually known about this problem for awhile but it’s never been a terribly pressing issue. That changed recently when we were doing some really high speed processing and we noticed our color conversions were a non-trivial portion of the time. I dug into our video i/o library, removed the conversion to RGB, and piped the Y channel of the YUV frames into our software. Everything appeared to be working perfectly. Output looks good. No memory leaks. Commit. Go home. Watch House.
The next day…
So I walk in, bright and early, and hear the disappointing news. We lost 4% accuracy. Did my colorspace change cause this? I hope not. Within a half an hour, I realized that my colorspace conversion change clearly was not correct. The frames that I had tested on the previous day looked identical, but they weren’t. I expected some minor round off error as a possibility, but the changes were more than I anticipated.
The error became extremely obvious when I ran the new decoding library on the opening of Star Trek. Conveniently, the opening of Star Trek is pitch black. My “new” library was not producing black! It was near-black, but very clearly NOT black. Uh oh. What’s going on?
Let’s look at the numbers
I dug into ffmpeg and very quickly reproduced the problem. The very first pixel of the very first frame of this episode of Star Trek had a Y value of 16. Not zero. When ffmpeg converted this value to RGB, the result came out to be (0,0,0). Black. Wait a minute, since when is Y=16 equal to black? What is going on?
I looked at the only other place in our codebase where we deal with this type of information: JPEG decoding. JPEG’s also use YUV formatting. What would this particular frame look like in a JPEG’s YUV? This is a quick test. Is it also 16? Nope, it’s zero.
For an RGB value of (0,0,0), ffmpeg is telling me the Y value is 16, and the jpeg library is telling me the value is 0. Either there is an egregious bug in one of the two most well-tested libraries on the planet, or I clearly don’t understand wtf is going on. I’m going to assume the latter. But that will have wait until after lunch.
Research time
When I got back from lunch, it was time to hit up the internet. If you search the internet for formulas to convert yuv to rgb you’ll get all sorts of conflicting information. You’ll even get very different formulas. If you read the YUV or YCbCr Wikipedia pages, you can easily miss the most important information (hint: it’s not the equations). After spending a tremendous amount of time reading (and being confused about the different formulas), I made the critical discovery. There are different definitions of YUV.
I then had to dig into the details to find out just how deep this particular rabbit hole goes. In the end, it wasn’t terribly complicated (but it was difficult to find good information). In essence, though, different standards define different dynamic ranges for the YUV color space when digitized into 8-bit per sample. Movie files (e.g, MPEG) will often use 16-235 for the Y channel (black->white), while images (JPEG) will use 0-255. A movie file’s white (235) != a jpeg file’s white (255). To make matters worse, the Cr and Cb (ie, U and V) channels use an entirely different set of dynamic ranges for MPEG files (though jpeg is always [0-255]). Oh my.
Note: if you are here because you are having similar yuv/rgb problems and google led you here, I strongly suggest you read every single word of these three links:
The Fix, part 1
If all I need to do is rescale to a different dynamic range, that is not a difficult problem to solve. It’s a fair bit tricky (watch those unsigned overflows!) but it’s nothing that can’t be accomplished through the power of C. I spent an hour or so writing a function to convert the three channels to the expanded dynamic range (remembering that the Y channel uses a different range than the U and V channels). I knew I’d lose some information, but what choice did I have?
Once I had finished, I ran all my previous tests and found the output to be far, far better than the one I was using from the previous day. I also tested my new conversion routine on images that failed from the overnight test and what do you know, they were now finding faces. Mission accomplished!
Not so fast my friend
It was about this time that I felt the need to vent. Seriously, movie and jpeg people, why are you doing this to me? Why are there two (note: actually more than two) different dynamic ranges for 8-bit YUV pixels? Why oh why? (more notes: if you want to learn why, it’s actually a fairly fun and interesting story… taketh thee to wikipedia).
In the need for some complaining, I decided to go onto IRC and complain to the only video developer I know (he works on x264 — the open source h264 encoder). I asked him why “they all” go around screwing with people like me with such nonsense. He laughed and went on to explain there are actually more than two different formats and commiserated with me for a moment. And then he said something important. He said only “--fullrange“. Wait. What is --fullrange? Is that an x264 parameter? Yes, yes it is. What does --fullrange do? It uses the fullrange of YUV. Ah! x264 devs are genius. Why would they leave this silly conversion to us?
Oh wait. Does that mean… does ffmpeg… do it too? It has to, right? Let’s check the docs, shall we. There sure are alot of formats on this list. I wonder if any of them are “full-range” YUV.
PIX_FMT_YUVJ420P Planar YUV 4:2:0, 12bpp, full scale (jpeg).
PIX_FMT_YUVJ422P Planar YUV 4:2:2, 16bpp, full scale (jpeg).
PIX_FMT_YUVJ444P Planar YUV 4:4:4, 24bpp, full scale (jpeg).
Does that “jpeg” mean these are “jpeg-style” full-range YUV outputs? I should try this. Within minutes I realized that yes, these formats outputted YUV channels that used the full dynamic range 0-255. Excellent. I reverted all my ugly changes with my own customized range expansion code and committed this final fix.
- PIX_FMT_YUV420P, + PIX_FMT_YUVJ420P,
One character. One friggin’ “J”. 8 hours. I hope no one is keeping track of “lines of code per hour”.