Inspiration vs Distraction
When the bug of inspiration bites, what do you do? Do you drop the old project and go for the new shiny one? Or do you ignore the urge?
I am one those of people who is constantly switching between my personal projects. I will work tirelessly on project A, and then on a certain Tuesday when the wind is blowing just right, I’ll suddenly lose all desire for project A and find a sudden and insatiable urge to work on project B. When inspiration strikes, it tends to be very difficult for me to ignore. My hard drive is consequently littered with half-starts, rough outlines, notes, and various ideas, bad and good, in various stages of completion.
This brings up the interesting question as to what the preferred state of affairs is. Should I be more disciplined and learned to ignore the grass-is-always-greener syndromes, or should I go with my own fickle whims? Am I perpetually distracted by these flirtations of inspiration, or is this, in some way, healthy? Given that I almost always chose to drop the old clunker and work on the new thing, you could certainly read into my character quite a bit, if you wanted. Maybe I am too fickle. Maybe I am easily distracted. Maybe I’ll never actually accomplish any of these things. Maybe. I see it another way.
Most projects get exactly one burst of effort before permanent abandonment. Some problems though, I find myself coming back to time and again. I view this particular mechanism as healthy. It’s a natural selection of my ideas. May the fittest survive. This is why I don’t really feel guilty for dropping yet-another half-started project and starting something new. If the idea survives a few months off, it’s probably worth keeping. If it doesn’t, so be it.
Embrace the distraction
So in that vein, I’ve put on hold several slightly boring personal projects for a shiny new one. I’m writing a game. Oh this will end well. I’ve spent the last three weekends tearing apart the UDK to learn how it works and over that time I’ve put together this spectacularly early prototype:
So here’s to another distraction.
Unplanned planning
Here are some rectangles:
struct irect {int top; int bot; int left; int right;};
struct frect {float top; float bot; float left; float right;};
Now, given those rectangles, you are asked to write a conversion from irect to frect.
struct frect convert (struct irect i) {
struct ans = {(float) i.top, (float) i.bot, (float) i.left, (float) i.right)};
return ans;
}
This might at first seem like a reasonable answer but… it’s almost certainly not.
The problem
What is the area of a rectangle with corners (0.0,0.0) and (10.0,10.0)?
Now, how many pixels are in an image with bounds (0,0) to (10,10)?
Hopefully, your answers disagreed (or at least you can see how they could disagree). In our field, we deal with images. Lots and lots of images. One of the most fundamental data structures you can possibly imagine when dealing with images is the lowly rectangle. For us, a rectangle is a region of an image. And it’s this role, as a region, that creates all kinds of pain and heartache.
We can more clearly isolate the problem by asking what is the lower-right hand corner of the above integer rectangle? Well, it depends on exactly what we mean by the lower-right hand corner. If you want to know where to look in memory for the lower-right hand corner pixel information, the answer is (x=10,y=10). If you wanted to know the Cartesian coordinate of the true lower right-hand corner, the answer would be (x=11, y=11).
The solution?
The difficulty here is the conflation of two separate ideas into the same form. First we have the integerized “point” on the cartesian plane, and then we have the 1×1 region known as a “pixel” and specified by its index. How should pixel indices convert to the cartesian plane? Put another way, should an integer rectangle (0,0)x(10×10) have an area of 100 or 121? This is, to some extent, a question of convention. If you pick one, and are consistent, everything will work out fine. However, almost certainly, for your application, one convention will result in cleaner code.
So the answer to the original question becomes a fairly complicated ordeal. This one problem only scratches the surface of the painstaking care that must be taken when dealing with integer and floating point regions of pixels. You would need to figure out exactly how and why you use rectangles and what index convention you want to use. For every function that you write operating on rectangles, you need to carefully consider the implications of your chosen convention. And depending on the convention you use, the rest of your code is going to look very different.
Writing a rectangle class seems straightforward. Yet, all planning in the world won’t prepare you because it’s almost impossible to anticipate this design issue. It’s the simple act of trying to use one of your converted rectangles that makes the problem obvious. Almost paradoxically, this problem, once realized, can only really be solved by meticulous and careful planning. The devil is always in the details.
The 8 hour journey to a single character
I came into work on day not too long and was met with an unfortunate piece of information. The test we ran the previous night reported a 4% drop in accuracy (we found 4% fewer faces). I frowned. I had clearly screwed something up.
You see, the day before, I found that our code was doing some really pathological and unnecessary color conversions on images. We were using our video i/o library (a very thin wrapper on ffmpeg) to read in movies and allowing it to convert the frames into RGB. As soon as our code got ahold of these RGB frames, we immediately converted them to grayscale. The pathology of the situation is that the vast majority of all video formats are already in the YUV color space (or YCbCr color space. I will use the terms interchangably, but you should know they aren’t quite the same thing). For those among you that aren’t image processing nerds, the Y-channel of YUV is functionally equivalent to a gray channel. In other words, videos are almost always encoded with a gray channel and 2 color (chrominance) channels. That means converting these images to RGB just to convert them back to gray is beyond wasteful.
We had actually known about this problem for awhile but it’s never been a terribly pressing issue. That changed recently when we were doing some really high speed processing and we noticed our color conversions were a non-trivial portion of the time. I dug into our video i/o library, removed the conversion to RGB, and piped the Y channel of the YUV frames into our software. Everything appeared to be working perfectly. Output looks good. No memory leaks. Commit. Go home. Watch House.
The next day…
So I walk in, bright and early, and hear the disappointing news. We lost 4% accuracy. Did my colorspace change cause this? I hope not. Within a half an hour, I realized that my colorspace conversion change clearly was not correct. The frames that I had tested on the previous day looked identical, but they weren’t. I expected some minor round off error as a possibility, but the changes were more than I anticipated.
The error became extremely obvious when I ran the new decoding library on the opening of Star Trek. Conveniently, the opening of Star Trek is pitch black. My “new” library was not producing black! It was near-black, but very clearly NOT black. Uh oh. What’s going on?
Let’s look at the numbers
I dug into ffmpeg and very quickly reproduced the problem. The very first pixel of the very first frame of this episode of Star Trek had a Y value of 16. Not zero. When ffmpeg converted this value to RGB, the result came out to be (0,0,0). Black. Wait a minute, since when is Y=16 equal to black? What is going on?
I looked at the only other place in our codebase where we deal with this type of information: JPEG decoding. JPEG’s also use YUV formatting. What would this particular frame look like in a JPEG’s YUV? This is a quick test. Is it also 16? Nope, it’s zero.
For an RGB value of (0,0,0), ffmpeg is telling me the Y value is 16, and the jpeg library is telling me the value is 0. Either there is an egregious bug in one of the two most well-tested libraries on the planet, or I clearly don’t understand wtf is going on. I’m going to assume the latter. But that will have wait until after lunch.
Research time
When I got back from lunch, it was time to hit up the internet. If you search the internet for formulas to convert yuv to rgb you’ll get all sorts of conflicting information. You’ll even get very different formulas. If you read the YUV or YCbCr Wikipedia pages, you can easily miss the most important information (hint: it’s not the equations). After spending a tremendous amount of time reading (and being confused about the different formulas), I made the critical discovery. There are different definitions of YUV.
I then had to dig into the details to find out just how deep this particular rabbit hole goes. In the end, it wasn’t terribly complicated (but it was difficult to find good information). In essence, though, different standards define different dynamic ranges for the YUV color space when digitized into 8-bit per sample. Movie files (e.g, MPEG) will often use 16-235 for the Y channel (black->white), while images (JPEG) will use 0-255. A movie file’s white (235) != a jpeg file’s white (255). To make matters worse, the Cr and Cb (ie, U and V) channels use an entirely different set of dynamic ranges for MPEG files (though jpeg is always [0-255]). Oh my.
Note: if you are here because you are having similar yuv/rgb problems and google led you here, I strongly suggest you read every single word of these three links:
The Fix, part 1
If all I need to do is rescale to a different dynamic range, that is not a difficult problem to solve. It’s a fair bit tricky (watch those unsigned overflows!) but it’s nothing that can’t be accomplished through the power of C. I spent an hour or so writing a function to convert the three channels to the expanded dynamic range (remembering that the Y channel uses a different range than the U and V channels). I knew I’d lose some information, but what choice did I have?
Once I had finished, I ran all my previous tests and found the output to be far, far better than the one I was using from the previous day. I also tested my new conversion routine on images that failed from the overnight test and what do you know, they were now finding faces. Mission accomplished!
Not so fast my friend
It was about this time that I felt the need to vent. Seriously, movie and jpeg people, why are you doing this to me? Why are there two (note: actually more than two) different dynamic ranges for 8-bit YUV pixels? Why oh why? (more notes: if you want to learn why, it’s actually a fairly fun and interesting story… taketh thee to wikipedia).
In the need for some complaining, I decided to go onto IRC and complain to the only video developer I know (he works on x264 — the open source h264 encoder). I asked him why “they all” go around screwing with people like me with such nonsense. He laughed and went on to explain there are actually more than two different formats and commiserated with me for a moment. And then he said something important. He said only “--fullrange“. Wait. What is --fullrange? Is that an x264 parameter? Yes, yes it is. What does --fullrange do? It uses the fullrange of YUV. Ah! x264 devs are genius. Why would they leave this silly conversion to us?
Oh wait. Does that mean… does ffmpeg… do it too? It has to, right? Let’s check the docs, shall we. There sure are alot of formats on this list. I wonder if any of them are “full-range” YUV.
PIX_FMT_YUVJ420P Planar YUV 4:2:0, 12bpp, full scale (jpeg).
PIX_FMT_YUVJ422P Planar YUV 4:2:2, 16bpp, full scale (jpeg).
PIX_FMT_YUVJ444P Planar YUV 4:4:4, 24bpp, full scale (jpeg).
Does that “jpeg” mean these are “jpeg-style” full-range YUV outputs? I should try this. Within minutes I realized that yes, these formats outputted YUV channels that used the full dynamic range 0-255. Excellent. I reverted all my ugly changes with my own customized range expansion code and committed this final fix.
- PIX_FMT_YUV420P, + PIX_FMT_YUVJ420P,
One character. One friggin’ “J”. 8 hours. I hope no one is keeping track of “lines of code per hour”.
My billion dollar business idea
A while back I watched a little minidocumentary about a startup company and I remember almost none of it except one part. I remember the guy talking about how he felt like he had an idea with the potential to become a billion dollar company. Today, here and now, I’m going to give mine away. It’s not every day that someone gives away such a valuable thing as a billion dollar idea. Today is that day.
You might take umbrage to the last paragraph on the notion that it’s possible to even have a “billion dollar idea”. You are right, of course. This idea is so good that once you start making piles of cash, everyone will be coming out of the woodwork to compete with you. Proper execution includes fending off rivals, navigating a brutal regulatory system, and possibly fixing a few legal obstacles. The industry we will be targeting is absolutely flush with cash. The drug industry. And our angle into the drug industry is with the single most successful drug in human history. No, not penicillin. Much bigger. The placebo! My billion dollar idea is a placebo pharmacy. What could possibly go wrong?
Now I know what you are thinking: there are already placebo pharmacies. In fact, selling snake oils of various types is one of the oldest businesses in the history of humankind. This idea, however, is to have doctors write fake prescriptions and have pharmacists fill fake prescriptions, and have everyone lie to the patient. This may seem unethical, but it is medically necessary! I promise. What happens if the idea catches on and everyone else starts making sugar pills to compete with you? Do what big drug companies always do, crush them with patents, regulations, and legal maneuvering!
A Crash Course
1. The patient must believe
The only requirement for effective placebo treatment is that the patient must be deceived. This means doctor’s need to be able to write prescriptions for your placebo drugs and patients need to not find out they are placebos. There are many ways you could accomplish this but you will need a sufficiently painless way for doctors and pharmacists to keep up on the names.
2. Colorful placebos are stronger than white ones
Not only do colorful pills work better than white ones, but certain colors have certain side effects. Blue pills, of course, will make you drowsy. Yellow ones will keep you awake. Design your pills accordingly. Version, version, version.
3. Presentation makes placebos stronger
The more impressive the bottle, the more side effects that are listed, and the more the doctor warns the patient of the potency of the drug, the more powerful the placebo effect. A doctor in a white coat gives stronger placebos than a doctor without his coat. You can’t make this stuff up.
4. More expensive placebos are stronger than less expensive ones
Amazing, and true. Placebos are like the luxury item of the drug world. Price is the ultimate signal. What could work better to a young businesses advantage than this? You see, it’s not “unethical” (ethics, bah) to sell insanely expensive sugar pills. It’s medically necessary!
5. The future
Last but not certainly not least, placebos are getting stronger! Contrast that with sissy antibiotics which are losing their war with the theory of evolution. Astoundingly, over time, as the human race is inundated with more and more advertising and faith in the medical and drug industries, the placebo effect is increasing. This particular industry has been around forever, and will be around for a very, very long time.
Conclusion
All joking aside, I do wonder if there is a way to “legitimately” (read: ethically) take advantage of the placebo effect. There are entire industries based on the placebo effect right now. There is billions of dollars being made by people selling treatments with absolutely no proven medical efficacy whatsoever. Would you consider it ethical to sell a drug that worked solely on placebo effect? Most people would presumably say no. However, could it be? What if you had a deal worked out with doctors and pharmacies for placebo drugs. The doctor wrote a prescription. The insurance company “covered” the cost. And the patient was misled into believing it was an effective and expensive (and colorful!) drug?
It seems to me unfortunate that we leave the effective placebo therapy for the scam artists.
My greatest distraction
I’ve made an important personal discovery. I’ve found my greatest distraction. It’s not the phone, or my email, or instant messaging, or the internet. It is having no distractions. My guess is some small segment of people are just like me and instantly understand. The rest are probably quite confused.
I have a friend who swears by Freedom.app for getting things done. It disables your networking (for OSX) and the only way to get the internets back is to reboot. And macs don’t reboot quickly. The pain of rebooting combined with no internet connection is supposed to provide an ideal working eviornment. There is absolutely no way this will work for me.
Last week, we lost our internet connection at my house. Comcast was coming out the following day to fix it, and there was little I could do in the interim. For most people, this would be a fantastic oppurtunity to get stuff done. I could have cleaned the house, or raked the leaves, or even worked on some of my personal programming projects. I did none of that. I couldn’t. The little firefox icon sat there taunting me. I ended up playing World of Goo for a few hours.
I have a difficult time actually putting my finger on why this is. Part of the explanation is my deep need to troubleshoot these problems. If the internet is down, I want to know whose fault it is. I fire up tracert and figure out who I can blame. But there’s a bit more to it. Even once I’ve figured out whose fault it is, and it is out of my control, it remains a huge distraction. It is just much easier for me to concetrate when everything is right in the universe. I am able to ignore the temptation of distraction but it drives me batty when there is no temptation. I cannot explain this to my own satisifaction. Am I just crazy?