A simple interactive proof of the theory of evolution
Let’s conduct a very simple experiment, shall we? I will show you two images. Look at each image and try to gauge your emotional reaction. Don’t just focus on what you are supposed to be feeling, but the actual visceral reaction you have to each image.
Which image gives you a greater emotional response?
For most people, the second image creates a much stronger emotional response than the first. This is not necessarily a rational response. Both situations are (roughly) equally dangerous, however we have built-in circuitry that deals specifically with situations that were likely to occur in our evolutionary past. Only one of these fits the bill. Apparently, this (useful) fear of heights is instilled in us while we are quite young (about the same time we develop the depth perception to perceive it). Sadly, I know of no such innate fear of firearms.
Here’s another fun link about our recent ancestry with the other apes
[from reddit].
Never trust a programmer who says he knows C++
I’ve been in an interviewing mindset for awhile and I’ve come to realize something important about C++ in particular. C++ is a “two peak” language. That is to say C++ is the only language I know of where two very different sets of programmers consider themselves well versed in the language. Let me show you in fake graph form:
Programmers (especially those coming from C) can very quickly get up to speed in C++ and feel quite proficient. These programmers will tell you that they know C++. They are lying. As a programmer continues in C++, he goes through this valley of frustration where he fully comes to terms with the full complexity of the language. The good news is that it’s really easy to tell the difference between C++ programmers pre- and post- valley (in an interview, in this case). Just mention that C++ is an extremely large and complex language, and the post-valley people will give you 127 different tiny frustrations they have with the language. The pre-valley people will say “Yea, I guess. I mean, it’s just C with classes.”.
Non-programming books for programmers: The Selfish Gene
My wife loves fiction. It’s not my thing. I’ve tried, but I struggle to read fiction. There are some novels that I’ve heard are beloved by my fellow programmers (Cryptonomicon for one), and I’d love to hear any other suggestions along this line.
For me, though, my heart belongs to nonfiction. Especially science. One book, in particular, stands above the rest. I’ve read The Selfish Gene, by Richard Dawkins, twice and, for me, it’s quite simply the most interesting book I’ve ever read.
If you haven’t read it…
If you are anything like me (a programmer and general science nerd) and you haven’t read it, this is approximately what you’ll be thinking:
I understand evolution pretty well, and I don’t really need to read a book to understand evolution. Besides, evolution is kind of boring — the fittest survive, and pass on their genes, problem solved. I’d rather read about physics.
—me (and maybe you), before reading
Let me try to put into words how wrong this was. First, and foremost, this book made me realize how absolutely little I’ve actually thought about evolution, and how absolutely fascinating evolutionary biology is. This book is basically a series of mental experiments to explain the biological world through game theory. The organisms are playing a giant game, and those with the best strategies will survive. Please note that the “best” strategies often depend on what strategies everyone else is using. Therefore, a population will then tend towards evolutionary stable mixes of strategies. You will have repeated WTF moments as he makes the most unintuitive animal behaviors (ie, strategies) suddenly make perfect sense.
Second, let me say, that as a programmer, you have a VERY UNIQUE perspective that makes evolutionary biology (and the arguments in the book) extremely fascinating (more on this in a bit). Last, and maybe most importantly, for me, it is fairly safe to say that if I had read this book in high school, instead of recently, I’d probably have become an evolutionary biologist, and not a programmer. It was that interesting.
A quick overview
In the book, Richard Dawkins basically argues for an evolutionary understanding from a “gene’s eye view”. In other words, everything in evolution makes sense if you think of genes as the unit of selection, and not individuals nor groups. Most of the book is set against the backdrop of Dawkins arguing against a competing theory known as group selection. Now, I am no evolutionary biologist, so I’m not going to weigh in too much on who is right (as I understand it, Dawkins is “mostly” right but there is some controversy at the edges), but the book itself makes so many fascinating arguments that it’s worth the read for that alone.
It is amazing how much the gene’s eye view of the problem changes the game. It can, to some extent, put your life into perspective. From a gene’s perspective, keeping you alive until you reproduce is a good thing. Keeping you alive for too long after is a bad thing, because you are now competition for your children. You are nothing but a survival mechanism built for and by your genes. At some point, your genes would prefer if you’d just die.
Certain behaviors are, seemingly, difficult to explain from a naive understanding of the theory of evolution. The primary one (one that is discussed at length) is altruism in the animal kingdom. It would seem that given the whole survival of the fittest thing, altruism would usually be a bad idea. Dawkins argues extensively, using numerous examples, of exactly how altruism helps particular genes reproduce. Or, and more to the title, how “selfish” actions by a gene (the gene looking out for only itself) leads to altruistic actions in the individual.
A giant exercise in game theory
Dawkins goes on to make plausible explanations for all sorts of phenomonan that you’ve probably never even considered as requiring an explanation. Here’s a great example: you might be able to think of a good evolutionary explanation for why sexual reproduction came about (Dawkins gives you the usual reason, if you can’t), but can you think of a sound reason why nature settled on two sexes? Why not just one (anyone can mate with anyone?) Why not three? Or Four? The answer Dawkins gives is a bit speculative, but fascinatingly plausible. [note: much of the book is Dawkins' summary of other people's work on these various issues... I don't want to give the impression that all of these discussions are his original work]
The basic argument goes as follows. In the beginning of sexual reproduction among early organisms, all sex cells were basically the same. This meant any pair of sex cells could merge and become a new organism (ie, there was just one sex). Larger sex cells had an advantage over smaller ones, as it started the new organism off with a larger food supply. Over time, these cells get bigger, last longer, and are more well protected. These large sex cells are increasingly expensive to produce, but they provide the future offspring with the best start. This opens up the gene pool to exploitation from a “selfish” strategy. If you could produce extremely small, extremely cheap sex cells in large numbers, you could mate and piggyback on the “expensive” large sex cells. So some organisms began flooding the area with these cheap cells, and succeeding by piggybacking. And from here, the two types of organisms continue to diverge, generating the two sexes we see today.
So while you might argue that an all-female population would be more successful (because the males are “cheating”), an all-female population is inherently unstable. A single male introduced to this all-female world will succeed at reproducing at an alarming rate (note: this argument takes place long before animals had the ability to properly “choose” their mate, and this “choosing” and it’s impact on evolution are discussed at length, with amazing results). Evolution is not creating the most fit populations, but the most stable.
This, by the way, forms the genesis of all sorts of sex-related differences seen throughout the animal kingdom: monogous preferences from females, promiscuity from males, male competition for females (in winner-take-all harem-like arrangements), and so on. And here’s an interesting follow-up: if one male can produce enough sex cells for multiple females, what is the optimal sex ratio? Almost all animals hover around 50/50, but why? Why couldn’t, for example, 75% female, 25% male work? Dawkins tries to explain this too!
Abstract evolution
The arguments made in the book just appeal right to the core of a computer programmer who is constantly thinking in optimizations, algorithms, and mathematics. As I said before, the entire book really reads as an attempt to explain various natural phenomenon through a combination of game theory, and the world of the genes. As a computer scientist, you cannot help but see yourself writing programs to test ideas that Dawkins puts forward. You cannot help but see the parallels in Dawkins arguments to computer algorithms. When Dawkins discusses, briefly, how computer programming is a secret passion (he might say vice) of his, you will immediately understand it “from the other side”. The two disciplines are so incredibly similar. For the exact same reason I am drawn to his arguments, he is drawn into computer science. The two fields have such interesting mental overlap.
The most fascinating take away from the book is a view of evolution not as a property intrinsic to life, but as an emergent property of certain types of systems. Perhaps most famously, this is the book that Richard Dawkins invents the word meme. A meme, in his context, is an “organism” that spreads not through the natural world, but in our own minds. In other words, memes are ideas, and some are more fit to “reproduce” from brain to brain. These memes exist in an enviornment (our culuture, history, morals, and so on), and adapt as necessary to continue surviving.
I’m less fascinated by memes themselves as I am about what memes represent. The existance of memes is basically Dawkins’ way of saying that evolution is an emergent property of a certain type of system. It follows directly from a set of axioms: given some form of descent with modification, and some non-random selection of those descendants, then evolution not only will happen, but must happen. Any system that fits that requirement will have evolution occurring. These systems will strive towards “organisms” that are stable within their “environment”. In other words, evolution is not just a theory of life but a direct and unavoidable consequence of the rules of the game.
Parallel programming is hard. Right?
It’s a fairly oft-repeated mantra of programmers. Parallel programming is hard. Right? Right! Almost everyone will tell you so! I was at a job fair not too long ago and even one of the students told me so. Everyone knows parallel programming is hard.
Sadly, though, I disagree with the vast majority on why. The most common answer, by far, will involve some combination of the following words: deadlock, livelock, synchronization, shared memory, race condition, thread safety and so on. You can find umpteen blog posts explaining how difficult it is to avoid race conditions, and how some of the most pernicious bugs come from synchronization issues. Yes, indeed.
(note: this post is mostly about data-parallel programming, or other parallel optimizations for computational performance purposes. There are other reasons to use concurrency, but I’m not sure my argument will hold in most of those cases.)
Avoiding race conditions is hard, but it’s not that hard.
There, I said it. I think most people, especially those who’ve only learned about parallel programming in a classroom setting, believe that race conditions, synchronization, and dead-locks are the crux of the parallel programming problem. There are a whole bunch of really solid reasons for me to say they are wrong:
- You can almost always use a simple locked data-structure for synchronization fairly painlessly (and with a bit of practice, you can usually write one fairly painlessly, too).
- You can almost always abstract away the synchronization primitives and unit-test them really well.
- There are tons of good off-the-shelf and well-tested synchronization libraries for most platforms, and most languages.
- The truly difficult synchronization problems have many, many smart people working on them, and these are things you should not be reinventing.
- And last but not least, and most importantly, no matter how difficult your synchronization issues are, I can assure you that bigger problems are lurking.
Why parallel programming is actually hard
There is probably no one correct answer to this question, but most will probably involve words like: granularity, throughput, response-time, cache size, memory bottleneck, and so on.
Let’s start with the core of the problem: granularity. When parallelizing some computation, you must decide “at what level” to parallelize the problem. Choosing the correct granularity is almost never immediately obvious and, almost always, there is more than one correct answer. Trade-offs abound.
I’ll invoke a problem from my daily work that will help explain this nicely. A video file is nothing but a sequence of images. If you want to perform face detection on a video stream, you can parallelize in several ways. A coarse grain approach might be to give each worker its own image, and collect the results at the end. A finer grained approach might throw all of the threads at a single image hoping you can process that image n-times faster.
Here are some general (yet frequently broken) rules of thumb:
1. Finer grained algorithms will thread less efficiently than coarse grained. Usually, it’s because any natural parallelism in your problem tends to improve as you become more coarse grained (do 20 images in each thread, instead of one). Furthermore, coarse grained algorithms do less communication and synchronization per unit of work. This means coarse grain approaches will tend to have better throughput (work done per unit of time) and less administrative overhead. There is an important caveat here, if you are too coarse grained, or if your jobs vary in size too much, you run smack into scheduling problems that will certainly leave some workers with nothing to do.
2. Finer grained algorithms will have better response times. In this case, response time means from submission of a work unit, until the result is ready. Since a fine grained algorithm is more frequently synchronizing the state, you have more opportunities to prioritize things that need to get done, and then declare them done.
3. Finer grained algorithms will tend to use less cache/memory. This is because, in general, less data being worked on.
4. Finer grained problems tend to be more painful to maintain. You will assume parallelism along some dimension that really needs to be serialized for some new feature.
Let’s go back to my real-world example. Sometimes throughput is king, like when you are processing a movie file. In this scenario, sending a frame to each worker thread and collecting the results later is fairly easy to implement, and threads extremely well. Other times, response time is king. For example, when tracking a face with a pan-tilt-zoom camera. Sending each frame to a different worker is completely inappropriate for this use-case because this doesn’t improve response time at all (each thread only has one worker, still). As soon as a frame is submitted, you need an answer as quickly as possible to have stable control. In this case, it is far better to throw all of the worker threads at a single frame (thread at the sub-frame level), and hope to improve the computation of a single frame.
So what do you do when the frame-level approach threads more efficiently than the sub-frame level treading? Well, you’ve run into a situation where one approach is better for throughput, and the other for response time. If you want to support both use-cases, you need both approaches (and some good documentation!). I think this demonstrates my original point that bigger problems than synchronization are lurking. No matter how difficult your synchronization problems are, I just doubled them.
Trade-offs in every direction
So lets say you can sort your way through that nest of trade-offs to find a place that gives you the right balance of response-time and throughput, stays within your memory requirements, and threads at an acceptable efficiency, you are home free! Right? Wrong. Now add in the fact that you cannot predict the future, and you don’t know what features will be needed (and what will then need to be serialized), how your response-time vs throughput needs might change, or how you will ever maintain this mess.
If you’ve survived this far, just remember the computer has one last trump card to play on you. All of these threads run on all real machines, with real memory bandwidths, real caches, connected over networks with fixed bandwidth. Each of these constraints has the ability to trip up your beautifully parallel problem and make it painfully serial. Let’s not even mention the idea that it might need to run on different machines with different capabilities.
In truth, all of these difficulties come from the same source. Mapping a non-trivial problem onto a parallel machine optimally borders on the impossible when you fully consider all of the different requirements along all of the different dimensions.
And so here we are. When I think of parallel programming, synchronization is difficult, but managable. It is not, however, what makes parallel programming hard.
On Debt Collectors
When I moved to Pittsburgh, I got a new cellphone number. I traded in the lackluster views of my 850 area code for the it-is-what-it-is 412. From almost the very first day of having this new cellphone number, I began receiving calls for a guy named Paul. In the beginning, people would apologize. Some people, however, started asking me weird questions. Like what my name was, and if I knew any Pauls.
Over time I began to realize that Paul, and my phone number, were in the hands of debt collectors. I would receive a dozen calls in a week and one by one be forced to tell each collector that I was not Paul, I was never Paul, and they needed to stop calling me. Most were polite, some were not. At first, the aggressive ones were fun to deal with. Several people tried to tell me I was covering for Paul, or I was lying about not being Paul. Some would ask me my name, which I never gave (replace Paul’s name with mine? Oh god.), and this seemed to really piss some of them off. After a month or so of dealing with these calls, I’d end up off of most lists, and the calls would stop. A few months later, presumably after the bad debt had been sold to the next round of collectors, the cycle would start up again. This has been going on for almost 5 years although I admit that each outbreak of this debt collector herpes is less severe than the last.
I found out quite a bit about Paul through debt collectors. One told me I’d end up in jail… again. I asked if he knew what a debtor’s prison was, and he didn’t. Another told me to stop buying drugs so I could pay my debts. A third told me that I’d lose my third job this year if I didn’t pay. I have no idea how much of this is true, of course. It should be noted that the fact that I am not Paul, it is actually fairly illegal for these debt collectors to reveal anything to me about Paul. But reveal they did. It’s hard to say which of these were true, and which were tactics. I have reason to believe at least one of them was just making shit up completely.
But not everything was made up. I know Paul’s full name. I know his (former?) address. I know his occupation(s). I know several of his friend’s names. I was dealing with people that were not scrupulous at all. On more than one occasion I had to bust out the Fair Debt Collections regulations (or at least bluff my way through them) because of some asshat on the phone trying to bully me into admitting I was really Paul. Or trying to get me to pay for Paul’s debts so that the calls would stop. And so on.
About a month ago, all of this changed. I started receiving calls from probate and estate “people” (they thought of themselves as legal professionals, but to me there were more like debt collectors for dead people). You see, apparently Paul had died. And apparently the probate people don’t quite have the same set of rules that the debt collector people have because they told me quite a bit more about Paul.
Paul had a shitty life, it appears. It is, of course, hard to tell given that my only information about Paul is through debt collectors. Maybe Paul faked his own death to get away from all of these assholes. He is, hopefully, living on a boat on the keys. If not, rest in peace, man.
