{ on programming and the internets }


by Louis Brandy

Me vs Backspace: my war against a rogue character

Amazingly, this is my second story about me and a single character.

A while ago, I began receiving reports from various people in IRC that I was prepending a garbage ascii character to some of my messages. I found this very odd. The complaints didn’t stop. It became a running joke. Eventually, playful mockery. It is a bit demeaning to hang out in programmer corners and not have any idea how, what, or why your computer is misbehaving.

After a bit of digging, I discovered that the rogue in question was the character for backspace. Now, this is an odd character to be inserted into text because most often this character doesn’t actually go anywhere, it informs the program that you want to, you know, backspace. This story gets a bit tricky here because not only does this character rarely make it into text, but most programs won’t render it. Only certain programs, on certain platforms, will actually render the backspace character as anything (usually a box with a little 0×0008, or as ^H). So while other people were telling me about rogue characters, I never saw them.

My first suspicion was that the IRC program I was using, Colloquy, was bugged (spoiler alert: it’s not). I spent a long time asking colloquy/irc people, searching the googles, and otherwise trying to figure out how this rogue character was getting printed. I continued to fail. I continued to be mocked.

The plot thickens…

One day I recieved an instant message from my brother: “What is that weird character at the beginning of your messages?”. Wait? What? My instant messages have the same character? Surely Adium isn’t suffering from the same bug, is it? How have I not heard of this before?

It turns out that most instant messaging programs don’t render the backspace character. Pidgin, however, does. And that day my brother was using Pidgin, and he saw the character of doom. I eventually noticed that I was apparently inserting rogue character using Safari, and later Firefox, all on my work computer (a Mac). It’s important to note that none of my Mac programs — Chrome, Safari, Firefox, Adium, or Colloquy — would render the character, but they were all inserting it at the beginning of lines into random things. (You could see the character if you viewed my text using Firefox on windows or linux, for example).

I decided to wipe my Mac, reinstall the OS, and see if that fixed it. It didn’t. I began to suspect this was a hardware problem having to do with my Mac. I spent a long time asking mac people, searching the googles, and otherwise trying to figure out how this rogue character was getting printed. I continued to fail. I continued to be mocked.

Luckily, my Mac was breaking anyway…

After this went on for close to a year, my Mac began having all kinds of problems including its charging circuitry, so I decided to replace my mac (or rather, lobby to have my work replace my mac). I bought a bright shiny new macbook pro and was good to go. I was pretty sure this would solve the problem.

Very soon thereafter, before I setup my new computer, I went to my Hacker News profile and saw this message: http://news.ycombinator.com/item?id=1531448 A user by the name of ‘davidw’ informed me that the stupid character appeared in a comment I wrote. There’s only one problem: I wrote that message using my wife’s mac. (if my life was a tv show, this is where the screen would go black — end of episode).

So I get my new computer setup, and ask the people in IRC if the character was still happening. It was! Gahh!!! This was good news, and bad news. It finally meant that I’ve reproduced this problem on three different macbooks in a variety of programs. It had to be something I was doing that was causing this.

I spent the next half-hour in IRC, talking to myself, trying to reproduce the problem. I finally figured it out.

TLDR: It’s a Mac bug (I think)

You can see the nasty character, live, in this comment: http://news.ycombinator.com/item?id=1530832. For many of you, however, this comment will look completely normal. That is because the program you are using doesn’t render it. If you, for example, use firefox in linux, you will see the rogue character. I’ve got a screenshot for you:

Here’s how to reproduce the problem, exactly: On a mac laptop (and maybe others), you need to hold down LEFT SHIFT and an arrow key (LEFT ARROW works great). While holding those two buttons, press backspace (or, to be precise, DELETE). The tricky part, however, is verifying that you’ve actually sent a rogue character, since none of the standard mac programs (that I know of) will render the character. The way that I verified the problem was to use Colloquy to send IRC messages to myself, reading those messages using emacs-irc. There are other ways, I am sure.

Why I kept doing it: I often use the shift-<arrow> shortcut to highlight various bits of text. Most often I tend to do shift-UP to highlight an entire line and then hit DELETE to remove it. Obviously, if I do this incorrectly, it ends up replacing the entire line with the backspace character. As I type in the new line, it comes out as a perpended backspace character on my final submission.

The million dollar question: is this a bug, or is there some feature I’m inadvertantly using? Anyone got any bright ideas on how I could fix this (other than unlearning my shift-UP habits)?

Silence is not golden.

So RethinkDB wants to know where all the real programmers are. I do too. While I can commiserate with their post, there is one huge issue I have with it. That is the issue of silence.

Let me start by saying that the primary purpose of RethinkDB’s post wasn’t to discuss their interview philosophy (it was to generate good leads — if you are looking for work, by all means, go apply). That means this post is probably slightly unfair to them in that I’m about to construct a gigantic strawman out of their post. However, I believe this template is so common that it still applies, in general.

As I’ve said here from time to time, I do quite a bit of the HR stuff for our small (startup — when are we no longer a startup?) company. I’ve been doing technical interviews fairly frequently for a few years now. One of the biggest lessons I’ve learned as an interviewer is that silence is often times my fault, not theirs. If an interview is mostly silence, the interviewee was either woefully unqualified (ie, it should have never gotten this far), or the interviewer lost them.

Silence is not a sign of just ignorance, it is almost always a sign of ignorance mixed with confusion or panic. If you ask someone a direct question “do you know what X is?”, they may think for a minute, but they will usually say “no”. Silence, however, usually means “I do not 100% know what they want me to say, and I need to think of the answer that is least wrong”.

Let’s use one of their examples.

How would you implement a read-write lock?

It doesn’t surprise me such a question would result in silence. It’s actually two questions. Do you know what the term “read-write lock” means and how would you actually implement one? For someone who was recently a student, or someone who hasn’t done much concurrency recently, this sounds an awfully lot like an exam question.

On the inside it’s a bunch of panic…

What -is- a read-write lock? Is that what you are asking me? Am I supposed to know what a read-write lock is? What class was that in? The one with the philosopher’s dinner table? Yea, probably. It’s some kind of concurrency thing, probably. It’s probably related to reading and writing? Uhm, that’s not a very good hint… Are they really asking me for the definition? Am I going to get marked down for not knowing the definition?

On the outside, it’s silence.

Say no to silence, interviewer and interviewee alike

You don’t actually have to code it over the phone. Mentioning starvation issues is bonus points. For heaven’s sakes, just give us something.

We try to ask about the difference between cooperative and preemptive multitasking. We try to ask about condition variables. 19 out of 20 times there is silence on the other end.

There’s a good lesson for everyone here. Silence is pure poison in an interview. There should be all of five seconds of silence before you give up and answer their question with a question. If you are being interviewed, say something. “Let me make sure I understand what a read-write lock is, first. Is that when you can have many readers and no writers, or just one writer?” Even if you are wrong, you can at least get them to specify what they mean, and then you can try to answer the question that’s been asked.

Even saying “I don’t know what a read-write lock is.” is almost certainly better than pure silence. Yes, it’s a negative statement, but if I tell you, and then you proceed to discuss it flawlessly, I’ll chalk it up to your textbook using a different term (or you just forgetting the vocabulary). No harm, no foul.

And then there is the flipside of this conversation: the cold, hard, reality that every interviewer faces from time to time. If an inordinate amount of interviewees are troublingly silent, it might be because your standard are really high. There is, however, another explanation.

If you are the interviewer, you have to realize that silence breeds silence. It’s usually easy to break down questions into parts that make it much less likely to be ambiguous and promote that silent, confused, panicked state. First, ask them if they know what a read-write lock is. Then ask them what it is. And once you’ve both agreed on the definition, ask them how they’d implement one. This simple modification results in far more “yes” and “no”, and far less confused silence.

Go USA.

Rooney, Lampard, and Gerrard? I’m not impressed. Just because one American could, arguably, make the England roster, doesn’t mean I’m impressed.

Go USA.

You can’t beat a good compiler… right?

Right! Sort of.

It is extremely well worn conventional wisdom that for your “average” programmer, it’s pretty much impossible to beat a good optimizing compiler at writing fast code. For those who skip the “understanding” part, and jump straight into the “application” part, this rule of thumb tends to be grossly misunderstood, and misused. For these people, compilers are so smart, and so good, that micro-optimization is basically impossible (uh, no), or too hard (no), or even that they make assembly knowledge unnecessary or obsolete (not even close). This tends to lead to a mentality that optimization can be left to the compiler and often results in people warning others to not bother with micro-optimizations at all.

Optimization, today, is largely about writing some naive code, looking at the assembly output, figuring out what you want to improve on, and then getting (or, often, tricking) the compiler into doing what you want. After several iterations of this, you end up with the code you want, and you gain all the benefits of having a compiler in the loop (portability, future-proofing, and so on). This process, though, obviously involves a great deal of low-level knowledge: assembly, cpu architecture, micro-optimizations that are likely to work, etc.

As an aside, I have a question: how exactly does one go about learning how to go beyond standard C into hardcore optimization? For me, it’s been a long road of trial & error, and random discovery. It seems to me that there is probably good resources out there for getting started on optimizing code (let’s say in gcc). Know any?

All the code I used can be found here: http://github.com/lbrandy/simple-optimization-test (see signal_blur.c). I ran all of these tests on Linux using gcc 4.3.3 and -O3. UPDATED: I changed the code to use c99’s restrict keyword, instead of gcc’s __restrict__

Getting yourself into trouble

Here is a naive (and contrived) one dimensional filter routine.

#define LENGTH 5000

// a bad impersonation of a gaussian filter
const float filter[] = {0.01f, 0.2f, 0.58f, 0.2f, 0.01f};

void naive(float* in, float* out)
{
  int i,j;
  for (i=0;i<LENGTH-4;i++)
  {
    out[i]=0.0f;
    for (j = 0;j<5;j++)
      out[i] += filter[j] * in[i+j];
  }
}

This is an interesting example because it contains a fatal flaw that might not be obvious. Take a second to understand what this code is doing. It’s fairly simple, but critical to the rest of the discussion. Let’s look at the assembly output of this, built using -O3.

08048430 :
 8048430:       55                      push   %ebp
 8048431:       31 c0                   xor    %eax,%eax
 8048433:       89 e5                   mov    %esp,%ebp
 8048435:       8b 4d 08                mov    0x8(%ebp),%ecx
 8048438:       8b 55 0c                mov    0xc(%ebp),%edx
 804843b:       90                      nop
 804843c:       8d 74 26 00             lea    0x0(%esi),%esi
 8048440:       d9 ee                   fldz
 8048442:       d9 14 82                fsts   (%edx,%eax,4)
 8048445:       d9 05 10 89 04 08       flds   0x8048910
 804844b:       d9 04 81                flds   (%ecx,%eax,4)
 804844e:       d8 c9                   fmul   %st(1),%st
 8048450:       de c2                   faddp  %st,%st(2)
 8048452:       d9 c9                   fxch   %st(1)
 8048454:       d9 14 82                fsts   (%edx,%eax,4)
 8048457:       d9 05 14 89 04 08       flds   0x8048914
 804845d:       d9 44 81 04             flds   0x4(%ecx,%eax,4)
 8048461:       d8 c9                   fmul   %st(1),%st
 8048463:       de c2                   faddp  %st,%st(2)
 8048465:       d9 c9                   fxch   %st(1)
 8048467:       d9 14 82                fsts   (%edx,%eax,4)
 804846a:       d9 05 18 89 04 08       flds   0x8048918
 8048470:       d8 4c 81 08             fmuls  0x8(%ecx,%eax,4)
 8048474:       de c1                   faddp  %st,%st(1)
 8048476:       d9 14 82                fsts   (%edx,%eax,4)
 8048479:       d9 44 81 0c             flds   0xc(%ecx,%eax,4)
 804847d:       de ca                   fmulp  %st,%st(2)
 804847f:       de c1                   faddp  %st,%st(1)
 8048481:       d9 14 82                fsts   (%edx,%eax,4)
 8048484:       d9 c9                   fxch   %st(1)
 8048486:       d8 4c 81 10             fmuls  0x10(%ecx,%eax,4)
 804848a:       de c1                   faddp  %st,%st(1)
 804848c:       d9 1c 82                fstps  (%edx,%eax,4)
 804848f:       83 c0 01                add    $0x1,%eax
 8048492:       3d 84 13 00 00          cmp    $0x1384,%eax
 8048497:       75 a7                   jne    8048440 
 8048499:       5d                      pop    %ebp
 804849a:       c3                      ret
 804849b:       90                      nop
 804849c:       8d 74 26 00             lea    0x0(%esi),%esi

From the assembly, we see that the compiler has done quite a bit (if you don’t know assembly, don’t worry, I’ll do my best to explain). First we see all of the address calculation involved in loading the filter coefficients has vanished (it is loading each one directly, e.g. flds 0x8048918). Second, notice that the (fixed sized) inner loop has been completely unrolled, resulting in each of the 5 multiplications and additions per iteration. So far so good.

There is, however, a very alarming surprise is this code. That is the quantity of store instructions (also loads). After every iteration of our inner loop (each filter coefficient), the result is stored. You can see 5 different store instructions (fstps, fsts) per iteration of this loop. Why? Let’s have a look at the code again:

void naive(float* in, float* out)
{
  int i,j;
  for (i=0;i<LENGTH-4;i++)
  {
    out[i]=0.0f;
    for (j = 0;j<5;j++)
      out[i] += filter[j] * in[i+j];
  }
}

To the inexperienced, it might be bewildering why the optimizing compiler would generate 5 store instructions for out[i]= in the inner loop. Why wouldn’t it just accumulate the answer in a register, and then store only the final result? The answer is: aliasing. The problem here is that compiler cannot assume that the pointers *in and *out are disjoint. It must store the result into out[i] each iteration because out[i] may be in[i+j] in the next iteration of the inner loop. With a bit of thought, it becomes clear how this code requires these stores to be correct in cases like *out pointing one float ahead of *in.

Another hard-learned tidbit: wasteful store instructions are terrible because stores can be incredibly expensive (far worse than an extra add or multiply).

Fixing it with restricted pointers

There are several ways to fix this problem, but in the spirit of teaching the art of optimization, I’ll go with the use of the __restrict__ qualifier (this is a gcc directive, but most compilers have some support for restricted pointers). [note from comments: restricted pointers are part of the C99 standard now using the keyword restrict. ]. The only change I made is to add __restrict__ to the function declaration:

void naive_restrict(float *__restrict__ in, float *__restrict__ out)
{
  int i,j;
  for (i=0;i<LENGTH-4;i++)
  {
    out[i]=0.0f;
    for (j = 0;j<5;j++)
      out[i] += filter[j] * in[i+j];
  }
}

This directive tells the compiler that you, the programmer, promise that *in and *out are disjoint, and no aliasing will occur. If you break your promise, don’t expect your code to be correct. Here is the assembly of that output:

080484a0 :
 80484a0:       55                      push   %ebp
 80484a1:       31 c0                   xor    %eax,%eax
 80484a3:       89 e5                   mov    %esp,%ebp
 80484a5:       8b 55 08                mov    0x8(%ebp),%edx
 80484a8:       8b 4d 0c                mov    0xc(%ebp),%ecx
 80484ab:       90                      nop
 80484ac:       8d 74 26 00             lea    0x0(%esi),%esi
 80484b0:       d9 05 10 89 04 08       flds   0x8048910
 80484b6:       d9 04 82                flds   (%edx,%eax,4)
 80484b9:       d8 c9                   fmul   %st(1),%st
 80484bb:       d8 05 0c 89 04 08       fadds  0x804890c
 80484c1:       d9 05 14 89 04 08       flds   0x8048914
 80484c7:       d9 44 82 04             flds   0x4(%edx,%eax,4)
 80484cb:       d8 c9                   fmul   %st(1),%st
 80484cd:       de c2                   faddp  %st,%st(2)
 80484cf:       d9 05 18 89 04 08       flds   0x8048918
 80484d5:       d8 4c 82 08             fmuls  0x8(%edx,%eax,4)
 80484d9:       de c2                   faddp  %st,%st(2)
 80484db:       d8 4c 82 0c             fmuls  0xc(%edx,%eax,4)
 80484df:       de c1                   faddp  %st,%st(1)
 80484e1:       d9 c9                   fxch   %st(1)
 80484e3:       d8 4c 82 10             fmuls  0x10(%edx,%eax,4)
 80484e7:       de c1                   faddp  %st,%st(1)
 80484e9:       d9 1c 81                fstps  (%ecx,%eax,4)
 80484ec:       83 c0 01                add    $0x1,%eax
 80484ef:       3d 84 13 00 00          cmp    $0x1384,%eax
 80484f4:       75 ba                   jne    80484b0 
 80484f6:       5d                      pop    %ebp
 80484f7:       c3                      ret
 80484f8:       90                      nop
 80484f9:       8d b4 26 00 00 00 00    lea    0x0(%esi),%esi

Now, you do not have to be an assembly expert to see how much more streamlined this code is. It consists almost exclusively of loads, multiplies, and adds with a final store at the end. This does exactly what we’d originally hoped. It realizes it can keep a temporary running sum and only store once at the end. We should also note that if you violate the restricted pointer promise, this code will not be correct!

It shouldn’t surprise you to see how much faster this version is, either:

  naive: 0.672957
  naive_restrict: 0.160432

It’s almost 5 times faster than the non-restricted version.

Going Forward

Though I haven’t actually done it, my overwhelming suspicion is that this code still has a ways to go in terms of speed. The next step would be the use of SSE instructions. Maybe next post.

The take-away from examples like this should be that optimization cannot be “left” to the optimizing compiler. You have to know what you are doing to get a compiler to make fast code. And if you really know what you are doing (which means knowing assembly and the underlying architecture quite well), you can usually get a modern optimizing compiler to do exactly what you want. You have to be in the loop. Truly optimized C code ends up looking nothing like C at all.

With careful hand holding, you can get a compiler to make fast code. In that case, it can become difficult to beat a compiler with hand-optimized code. But that is not because the compiler is so good, but because you are so good at getting the compiler to make the code you want.

Geeky musical comedy

I’ve been working on a bunch of stuff at work, so I have a bunch of fairly technical things to write about in the future. But for now, for today, I’m going with a public service. I’m a huge fan of geeky music and/or musical comedy. This is also a good opportunity to mention that I love Pandora. I spend a good 8 hours a day listening to Pandora. If you are bored, here’s my own favorite list of madness. You can easily lose a few hours wandering aimlessly through the youtube halls. If you’ve got any other similar acts, let me know.

Also, happy LOST finale.

They Might Be Giants

Not a very well kept secret but you should listen to all their stuff, if you haven’t. I’ve provided one of my favorites, officially titled Why Does the Sun Shine?

Flight of the Conchords

Another poorly kept secret. Again, if you haven’t listened to all their stuff, you are missing out. Here’s The Humans are Dead. Make sure to stick around for the solo:

Tim Minchin

Of the whole list, Tim is my favorite. His stuff is really smart, really funny, sometimes serious, but always appropriately childish. He can also flat out sing, if need be. This is actually a beat poem called Storm:

Jonathan Coulton

Everyone (I think) has probably heard the song Code Monkey. I’ve supplied it here, just in case. You should really dig into his other stuff, because there is quite a bit of amazingness to be found. Also, he licenses his stuff under the Creative Commons License, so that makes him someone worth supporting.

Tripod

I found these guys on reddit (I think) from the song below. They’ve got a bunch of other good super geeky comedy songs, as well. The song below, Gonna Make You Happy, is especially brilliant. I played it for my wife. I am so romantic.

Stephen Lynch

This guy is so hilarious but a fair bit, uh shall we say, explicit? It might not be for everyone, but he is so very, very funny. I’ve chosen one (called D&D)of the less explicit and geeky songs for embedding:

Axis of Awesome

You should go through youtube looking for all their stuff, but this one is just too good to be missed. The 4-Chord Song.


Need a new job?