{ on programming and the internets }


by Louis Brandy

You can’t beat a good compiler… right?

Right! Sort of.

It is extremely well worn conventional wisdom that for your “average” programmer, it’s pretty much impossible to beat a good optimizing compiler at writing fast code. For those who skip the “understanding” part, and jump straight into the “application” part, this rule of thumb tends to be grossly misunderstood, and misused. For these people, compilers are so smart, and so good, that micro-optimization is basically impossible (uh, no), or too hard (no), or even that they make assembly knowledge unnecessary or obsolete (not even close). This tends to lead to a mentality that optimization can be left to the compiler and often results in people warning others to not bother with micro-optimizations at all.

Optimization, today, is largely about writing some naive code, looking at the assembly output, figuring out what you want to improve on, and then getting (or, often, tricking) the compiler into doing what you want. After several iterations of this, you end up with the code you want, and you gain all the benefits of having a compiler in the loop (portability, future-proofing, and so on). This process, though, obviously involves a great deal of low-level knowledge: assembly, cpu architecture, micro-optimizations that are likely to work, etc.

As an aside, I have a question: how exactly does one go about learning how to go beyond standard C into hardcore optimization? For me, it’s been a long road of trial & error, and random discovery. It seems to me that there is probably good resources out there for getting started on optimizing code (let’s say in gcc). Know any?

All the code I used can be found here: http://github.com/lbrandy/simple-optimization-test (see signal_blur.c). I ran all of these tests on Linux using gcc 4.3.3 and -O3. UPDATED: I changed the code to use c99′s restrict keyword, instead of gcc’s __restrict__

Getting yourself into trouble

Here is a naive (and contrived) one dimensional filter routine.

#define LENGTH 5000

// a bad impersonation of a gaussian filter
const float filter[] = {0.01f, 0.2f, 0.58f, 0.2f, 0.01f};

void naive(float* in, float* out)
{
  int i,j;
  for (i=0;i<LENGTH-4;i++)
  {
    out[i]=0.0f;
    for (j = 0;j<5;j++)
      out[i] += filter[j] * in[i+j];
  }
}

This is an interesting example because it contains a fatal flaw that might not be obvious. Take a second to understand what this code is doing. It’s fairly simple, but critical to the rest of the discussion. Let’s look at the assembly output of this, built using -O3.

08048430 :
 8048430:       55                      push   %ebp
 8048431:       31 c0                   xor    %eax,%eax
 8048433:       89 e5                   mov    %esp,%ebp
 8048435:       8b 4d 08                mov    0x8(%ebp),%ecx
 8048438:       8b 55 0c                mov    0xc(%ebp),%edx
 804843b:       90                      nop
 804843c:       8d 74 26 00             lea    0x0(%esi),%esi
 8048440:       d9 ee                   fldz
 8048442:       d9 14 82                fsts   (%edx,%eax,4)
 8048445:       d9 05 10 89 04 08       flds   0x8048910
 804844b:       d9 04 81                flds   (%ecx,%eax,4)
 804844e:       d8 c9                   fmul   %st(1),%st
 8048450:       de c2                   faddp  %st,%st(2)
 8048452:       d9 c9                   fxch   %st(1)
 8048454:       d9 14 82                fsts   (%edx,%eax,4)
 8048457:       d9 05 14 89 04 08       flds   0x8048914
 804845d:       d9 44 81 04             flds   0x4(%ecx,%eax,4)
 8048461:       d8 c9                   fmul   %st(1),%st
 8048463:       de c2                   faddp  %st,%st(2)
 8048465:       d9 c9                   fxch   %st(1)
 8048467:       d9 14 82                fsts   (%edx,%eax,4)
 804846a:       d9 05 18 89 04 08       flds   0x8048918
 8048470:       d8 4c 81 08             fmuls  0x8(%ecx,%eax,4)
 8048474:       de c1                   faddp  %st,%st(1)
 8048476:       d9 14 82                fsts   (%edx,%eax,4)
 8048479:       d9 44 81 0c             flds   0xc(%ecx,%eax,4)
 804847d:       de ca                   fmulp  %st,%st(2)
 804847f:       de c1                   faddp  %st,%st(1)
 8048481:       d9 14 82                fsts   (%edx,%eax,4)
 8048484:       d9 c9                   fxch   %st(1)
 8048486:       d8 4c 81 10             fmuls  0x10(%ecx,%eax,4)
 804848a:       de c1                   faddp  %st,%st(1)
 804848c:       d9 1c 82                fstps  (%edx,%eax,4)
 804848f:       83 c0 01                add    $0x1,%eax
 8048492:       3d 84 13 00 00          cmp    $0x1384,%eax
 8048497:       75 a7                   jne    8048440 
 8048499:       5d                      pop    %ebp
 804849a:       c3                      ret
 804849b:       90                      nop
 804849c:       8d 74 26 00             lea    0x0(%esi),%esi

From the assembly, we see that the compiler has done quite a bit (if you don’t know assembly, don’t worry, I’ll do my best to explain). First we see all of the address calculation involved in loading the filter coefficients has vanished (it is loading each one directly, e.g. flds 0x8048918). Second, notice that the (fixed sized) inner loop has been completely unrolled, resulting in each of the 5 multiplications and additions per iteration. So far so good.

There is, however, a very alarming surprise is this code. That is the quantity of store instructions (also loads). After every iteration of our inner loop (each filter coefficient), the result is stored. You can see 5 different store instructions (fstps, fsts) per iteration of this loop. Why? Let’s have a look at the code again:

void naive(float* in, float* out)
{
  int i,j;
  for (i=0;i<LENGTH-4;i++)
  {
    out[i]=0.0f;
    for (j = 0;j<5;j++)
      out[i] += filter[j] * in[i+j];
  }
}

To the inexperienced, it might be bewildering why the optimizing compiler would generate 5 store instructions for out[i]= in the inner loop. Why wouldn’t it just accumulate the answer in a register, and then store only the final result? The answer is: aliasing. The problem here is that compiler cannot assume that the pointers *in and *out are disjoint. It must store the result into out[i] each iteration because out[i] may be in[i+j] in the next iteration of the inner loop. With a bit of thought, it becomes clear how this code requires these stores to be correct in cases like *out pointing one float ahead of *in.

Another hard-learned tidbit: wasteful store instructions are terrible because stores can be incredibly expensive (far worse than an extra add or multiply).

Fixing it with restricted pointers

There are several ways to fix this problem, but in the spirit of teaching the art of optimization, I’ll go with the use of the __restrict__ qualifier (this is a gcc directive, but most compilers have some support for restricted pointers). [note from comments: restricted pointers are part of the C99 standard now using the keyword restrict. ]. The only change I made is to add __restrict__ to the function declaration:

void naive_restrict(float *__restrict__ in, float *__restrict__ out)
{
  int i,j;
  for (i=0;i<LENGTH-4;i++)
  {
    out[i]=0.0f;
    for (j = 0;j<5;j++)
      out[i] += filter[j] * in[i+j];
  }
}

This directive tells the compiler that you, the programmer, promise that *in and *out are disjoint, and no aliasing will occur. If you break your promise, don’t expect your code to be correct. Here is the assembly of that output:

080484a0 :
 80484a0:       55                      push   %ebp
 80484a1:       31 c0                   xor    %eax,%eax
 80484a3:       89 e5                   mov    %esp,%ebp
 80484a5:       8b 55 08                mov    0x8(%ebp),%edx
 80484a8:       8b 4d 0c                mov    0xc(%ebp),%ecx
 80484ab:       90                      nop
 80484ac:       8d 74 26 00             lea    0x0(%esi),%esi
 80484b0:       d9 05 10 89 04 08       flds   0x8048910
 80484b6:       d9 04 82                flds   (%edx,%eax,4)
 80484b9:       d8 c9                   fmul   %st(1),%st
 80484bb:       d8 05 0c 89 04 08       fadds  0x804890c
 80484c1:       d9 05 14 89 04 08       flds   0x8048914
 80484c7:       d9 44 82 04             flds   0x4(%edx,%eax,4)
 80484cb:       d8 c9                   fmul   %st(1),%st
 80484cd:       de c2                   faddp  %st,%st(2)
 80484cf:       d9 05 18 89 04 08       flds   0x8048918
 80484d5:       d8 4c 82 08             fmuls  0x8(%edx,%eax,4)
 80484d9:       de c2                   faddp  %st,%st(2)
 80484db:       d8 4c 82 0c             fmuls  0xc(%edx,%eax,4)
 80484df:       de c1                   faddp  %st,%st(1)
 80484e1:       d9 c9                   fxch   %st(1)
 80484e3:       d8 4c 82 10             fmuls  0x10(%edx,%eax,4)
 80484e7:       de c1                   faddp  %st,%st(1)
 80484e9:       d9 1c 81                fstps  (%ecx,%eax,4)
 80484ec:       83 c0 01                add    $0x1,%eax
 80484ef:       3d 84 13 00 00          cmp    $0x1384,%eax
 80484f4:       75 ba                   jne    80484b0 
 80484f6:       5d                      pop    %ebp
 80484f7:       c3                      ret
 80484f8:       90                      nop
 80484f9:       8d b4 26 00 00 00 00    lea    0x0(%esi),%esi

Now, you do not have to be an assembly expert to see how much more streamlined this code is. It consists almost exclusively of loads, multiplies, and adds with a final store at the end. This does exactly what we’d originally hoped. It realizes it can keep a temporary running sum and only store once at the end. We should also note that if you violate the restricted pointer promise, this code will not be correct!

It shouldn’t surprise you to see how much faster this version is, either:

  naive: 0.672957
  naive_restrict: 0.160432

It’s almost 5 times faster than the non-restricted version.

Going Forward

Though I haven’t actually done it, my overwhelming suspicion is that this code still has a ways to go in terms of speed. The next step would be the use of SSE instructions. Maybe next post.

The take-away from examples like this should be that optimization cannot be “left” to the optimizing compiler. You have to know what you are doing to get a compiler to make fast code. And if you really know what you are doing (which means knowing assembly and the underlying architecture quite well), you can usually get a modern optimizing compiler to do exactly what you want. You have to be in the loop. Truly optimized C code ends up looking nothing like C at all.

With careful hand holding, you can get a compiler to make fast code. In that case, it can become difficult to beat a compiler with hand-optimized code. But that is not because the compiler is so good, but because you are so good at getting the compiler to make the code you want.

trackback

128 Responses to “You can’t beat a good compiler… right?”

  1. June 7th, 2010 at 8:16 am

    Hallvard Norheim Bø says:

    They actually tried to introduce a keyword into standard C to deal with things like pointer aliasing. But it never made it into the standard. See http://www.lysator.liu.se/c/dmr-on-noalias.html for Dennis Ritchie’s thoughts on the proposed “noalias” keyword.

  2. June 7th, 2010 at 9:53 am

    Joachim Schipper says:

    The “restrict” keyword is in C99; #define-ing it away for old compilers makes sense, but there’s no need to use the (obviously unportable) “__restrict__” extension.

    On the other hand, “naive” unrolling would work, too: out[i] = filter[0] * in[i] + … + filter[4] * in[i + 4] should generate good code (not tested, sorry!)

    Not that it matters to the present discussion, but your “naive implementation” has an off-by-one error, accessing in[LENGTH].

    But this is just nitpicking: nice article!

  3. June 7th, 2010 at 9:56 am

    Peter Varga says:

    restrict is a valid C pointer qualifier and works.
    paragraph 6.7.3.1 in the C99 Standard.

    It is not needed to use the __restrict__ , just use restrict.

  4. June 7th, 2010 at 10:14 am

    Magice says:

    This totally, completely, and utterly misses the point of compiler optimization, and optimization in general.

    First, yes, per current state-of-the-art, a master-level human assembler can assist the compiler to produce better, safer, faster code in these cases. However, who cares? Seriously, you are talking about, what, 0.6 second. It’s not worth the effort! Oh, did you even realize that you are burning time on 10 lines of code? Seriously, a toy program has thousands of lines of C code already. Good luck trying to optimize every single line. And it’s not like you can get out 0.6 sec every time. Maybe 0.01 next time. Remember, with a 3GHz CPU, 1% of a second is a LONG time.

    Oh, and did anyone tell you that mucking with non-standard stuffs are bound to disaster? And micro optimization may not work with future things? For example, multithreaded. Try really hard to minimize the IO of a single thread, but if the OS swaps out your process, destroy your cache, than the cost of swapping is so huge that your little optimization is not even worth mentioning.

    Meanwhile, why don’t you invest time in something with higher return. For example, data structures that minimize cache hits and can be used over and over. Or, algorithms that are magnitudes faster than your current ones? Or, a better programming paradigm? Or, JIT with selected compilation? Structural stuffs, not little tiny details.

    For some reasons, this reminds me of Vietnam war. America had all details right. They had bigger guns, bigger bombs, bigger airplanes, bigger budget; they won just about anything that they mucked into. And they got kicked out. Micro optimization, you think?

  5. June 7th, 2010 at 10:19 am

    Breck Fresen says:

    It’s not that optimizations like these are too hard, it’s that for the vast majority of applications they’re completely unnecessary. There are very few problem domains where dropping down to assembly is worth the time it takes to optimize, let alone the cost of readability and maintainability. If you’re one of those few engineers working on a kernel or some other piece of performance-critical system code, then more power to you. But suggesting that knowledge of assembly is necessary for the truly average programmer is ridiculous.

    That said, props on restrict – I’d never seen that before.

  6. June 7th, 2010 at 10:19 am

    louis says:

    @magice, thanks for the predictable response.

    Nowhere have I argued that you should be optimizing “all” of your code, or optimizing indiscriminately (or prematurely), or without exhausting other, better options. Nowhere have I said this type of optimization is preferable to cache optimizations, or other high level concerns. Knowing -when- to bust out micro-optimizations is, in and of itself, a skill worth discussing.

    However, if you are actually arguing that it is never correct to spend time doing micro-optimization, you are wrong. It’s just that simple.

  7. June 7th, 2010 at 10:27 am

    Joachim Schipper says:

    As pointed out to me at http://news.ycombinator.com/item?id=1410755, the off-by-one error is nonsense. Sorry!

  8. June 7th, 2010 at 10:53 am

    Pau Fernández says:

    Very nice article, thanks.
    Looking forward to the next post.

  9. June 7th, 2010 at 11:11 am

    George Phillips says:

    0.672957 / 0.160432 is about 4.2 — quite a bit short of “almost 5″. I nitpick here because numbers really matter when optimizing. Too many a time I’ve made similar leaps thinking that I had previously obtained “nearly 5 times speedup” and looked down upon something that was 4.5 times faster. Only to find out that the “nearly 5″ was actually slower because I wasn’t being careful.

  10. June 7th, 2010 at 11:56 am

    Sebi says:

    Lots of resources on optimizing here : http://www.agner.org/optimize/

  11. June 7th, 2010 at 11:56 am

    Alan McIntyre says:

    Nice article! Out of curiosity, I tried adding -funroll-loops, which made the naive function run faster than the naive_restrict function.

    Before adding -funroll-loops:
    naive: 0.461494
    naive_restrict: 0.143920
    sum_variable: 0.105645

    After adding -funroll-loops:
    naive: 0.107970
    naive_restrict: 0.125535
    sum_variable: 0.096111

    Looking forward to an SSE post. :)

  12. June 7th, 2010 at 12:21 pm

    nate clark says:

    Funny, I couldn’t recreate your results with gcc 4.3.3 on my machine. The O3 code I got used SSE, fully unrolled both loops, and did the elimination of the unnecessary stores. It also inlined naive() into main().

    I imagine the problem is that the assembly code you looked at was for just the naive() function itself. If you call that function from somewhere, then GCC will use alias analysis to detect that the pointers don’t alias, and probably inline, obviating the need for the restrict keyword. Granted, GCC doesn’t have the best alias analysis in the world, but I’d be shocked if this example is one that breaks it.

  13. June 7th, 2010 at 12:38 pm

    Fenris says:

    You can get the same output (if you really don’t have overlapping pointers) with a simple temporal variable

    float temp= 0.0f;
    for (j = 0;j<5;j++)
    temp += filter[j] * in[i+j];
    out[i] = temp;

    you clearly tell the compiler what to do, with this anti-optimization (I declared more variables than needed)

  14. June 7th, 2010 at 12:40 pm

    Loren Pechtel says:

    When you look at the cases where hand optimization can beat the compiler it normally comes down to the human knowing things the compiler can’t know. In this case you can inform the compiler but you can’t always.

  15. June 7th, 2010 at 12:49 pm

    louis says:

    @nate,

    Did you run the code as I have it on github, or just copy/paste this into your own file? I ask because I just double checked everything, and am not seeing the same things as you. It would seem, to me, that the function pointers as I have it setup keep gcc from inlining it. I’ve been running gcc 4.3.3 on a relatively clean ubuntu 9.04 box.

    Is it generating vectorized SSE, or just using the SSE stack for scalar fp operations? I can’t really get gcc to autovectorize this code at all, by playing with flags.

    I can, though, get decent speedups by using ‘-O3 -msse3 -mfpmath=sse’ which will force it to generate sse scalar instructions instead of fpu instructions.

  16. June 7th, 2010 at 12:57 pm

    Drako says:

    @magice
    To thee I quote my professor from back in my undergraduate days:

    “Hardware exists to provide scalability and high availability, not to cater to poorly optimized code.”

    That being said, what @louis said is a perfectly acceptable compromise. Sure application developers will always want to be lazy, so we shall cut them some slack and allow them to optimize just the critical segments of code.

    Remember, just as you as an application developer want to be lazy and finish your work quickly so you can go out and play, the people using that shitty-end-result of an application also want to do the same.

    P.S. when I say “you” I mean the general masses of application developers who have no sense of pride in their work.

  17. June 7th, 2010 at 1:09 pm

    Enlightenment says:

    The reason so many program runs slow and hog memory is because ignorant people quit worrying about optimizing code. They just assume it is a non-issue with a 3GHz processor, but they are wrong.

    The idiots don’t get it. LENGTH of 5000 might be trivial, but what if you had to do calculations on LENGTH of 100 MILLION or many different number tables, then savings would add up to be very large amounts of time.

  18. June 7th, 2010 at 1:32 pm

    Gypsy says:

    Great article! It’s not often that you see someone brave enough to wander into the dark depths of assembly programming. To those of you who respond with frustration and anger: I can only assume that you have a hard time reading/writing assembly.

    Saying that micro-optimizations aren’t useful is just silly. Most of you should be aware that some functions are called repeatedly, and those “2 seconds saved” per run may amount to a lot of time.

    Also, measuring the need for optimization based on the number of lines of code, indicates a terrible lack of understanding.

  19. June 7th, 2010 at 2:03 pm

    Jed says:

    @Enlightenment: With LENGTH=10^8, you would be memory bandwidth limited.

  20. June 7th, 2010 at 2:05 pm

    Tim Smith says:

    Being someone who spends most of my working day optimizing code, I have to agree that the idea that the compiler can optimize better than a human is a limited concept.

    In the author’s example, we have a perfect case where the compiler doesn’t have enough information to perform specific optimizations.

    In other cases, the code is just too complex for the compiler to properly deconstruct and optimize. This was a big problem with some compiler when dealing with C++ templates.

    That being said, I still find that most optimizations are higher level optimizations. For example, using proper containers or data structures. Repeating work is usually a big cause of problems.

    Good article.

  21. June 7th, 2010 at 2:05 pm

    Drifter says:

    I was kinda suspicious of your results so I tried your code on cygwin using actual timed tests. I bumped the LENGTH up to 100*1000*1000 to give it something to chew on, and checked the assembler to make sure it replicated your results (which it did). Using time ./a.exe. The profile time fluctuates from 0.577s to 0.655s on the fast version and from 0.545s to 0.670s on the slow version. Frankly, the error seems higher than the effect of the optimization.

    I may be wrong, but I suspect the memory timings are all important here. The most expensive operation is loading a chunk of data from main memory into the cache. The additional stores the you managed to optimize out only use the local cache so they are very fast and are effectively hidden by the load from main memory. The fluctuation in timings is caused by the cache getting flushed by other apps in system.

    Feel free to criticize my results but I suspect your optimization would only work on older hardware and embedded systems that aren’t so dependent on memory.

    I would be interested to see your own results if you tried to time the optimizations. I personally believe that your article is fundamentally flawed because you didn’t profile your results.

  22. June 7th, 2010 at 2:15 pm

    PacManfan says:

    Thanks for the good article on basic optimization.
    @magice – Listen to the responses on this article – you could learn a thing or two.
    As a video game programmer back in the early 90′s, I used to manually unroll inner loops and convert them to assembly by hand. Done correctly, it can change the video performance from 5-7fps to 30+ fps.
    The entire program does not need to be optimized, just the slowest, most-run code. That’s what a profiler is for.
    -PMF

  23. June 7th, 2010 at 2:16 pm

    Drifter says:

    I apologize – I didn’t see the timings at the end of the article.

  24. June 7th, 2010 at 2:20 pm

    Kevin H says:

    @Fenris: That isn’t really an “anti-optimization” per se, as that variable would be put on the stack, which doesn’t really add extra memory (since it doesn’t call any functions within). Unless this call was very deep in the stack, I doubt it would affect anything.

  25. June 7th, 2010 at 2:42 pm

    angelom says:

    Thanks for the article.

    I come from an embedded background where I have found bad C compilers producing painfully bad code.

    Sheeple are repeatedly quoting ‘Gotos Bad’, ‘Assembly Optimization Bad’, wondering what the next mantra will be…

  26. June 7th, 2010 at 3:38 pm

    Dan says:

    A good introductory-level textbook that sets the context for this topic is Computer Systems: A Programmer’s Perspective, Bryant & O’Halloran. Uses X86, Linux (with appropriate Windows asides), and GCC, covers 32-bit and 64-bit architectures. It doesn’t dive into detail on hard-core optimizations, but it does address memory aliasing and other compiler optimization blockers, memory caches, etc. Written for undergraduate students who are new to both C and X86 assembly.

    http://csapp.cs.cmu.edu/

  27. June 7th, 2010 at 3:56 pm

    Jon says:

    I see nothing about you trying to avoid stalling the instruction pipeline in the CPU. The days of non-pipelined, CISC chips are looong gone, my friend.

  28. June 7th, 2010 at 4:25 pm

    dude says:

    I understand the point of your article, but it’s just plain silly not to mention that you will on average get more optimization by picking the most efficient algorithm and profiling than you ever will with premature optimization. Get it to work -right- first and then get it to work -fast- if profiling tells you it’s necessary. You should always mention this as beginners will come across your article and think “well now I have to back and optimize everything”.

  29. June 7th, 2010 at 4:33 pm

    Joachim Schipper says:

    @nate clark: note that such analysis doesn’t work if it’s not in the same source file; even with GCC’s new whole-program optimization, it won’t work if the function is in a shared library.

  30. June 7th, 2010 at 5:17 pm

    jalf says:

    @Magice: oh really? 0.6 seconds?

    The largest performance boost I’ve achieved solely from microoptimizations, from trying to “beat the compiler” is… over 800%. The application as a whole ran 8 times as fast when we were finished optimizing.

    I think that’s worthwhile. It certainly was in that specific case.

    Your assumption that it is not worth it to consider performance on a lower level than choice of algorithm is simply foolish.

    But bad programmers will always try to make a case for why being a bad programmer, why not understanding the hardware you’re running on, or the code you’re executing, is a good thing.

    Guess what. At the end of the day, they’re still bad programmers. They’re missing some vital tools in their toolbox.

    A good programmer certainly knows that messing around with low-level optimizations isn’t the first thing you should do, but he *also* has the option available to him. He has the skills required to understand his code, and the performance of that code, because *sometimes* that is essential in getting good perfomance.

  31. June 7th, 2010 at 5:42 pm

    jalf says:

    As an aside, I have a question: how exactly does one go about learning how to go beyond standard C into hardcore optimization? For me, it’s been a long road of trial & error, and random discovery. It seems to me that there is probably good resources out there for getting started on optimizing code (let’s say in gcc). Know any?

    http://www.amazon.com/Computer-Architecture-Quantitative-Approach-4th/dp/0123704901 was an invaluable help in my case. It’s not about optimization, but it’s an amazing resource for understanding the CPU. And when you do that, most optimization tricks become pretty obvious.

    Also look up the documentation for your target processor (both AMD and Intel have these available in PDF form on their websites)

    Knowing which instructions are pipelined and which ones aren’t, and their relative latencies and all that good stuff helps too. Reading the assembly output from the compiler isn’t enough. You also have to know how expensive the instructions it chose are.

    Humans certainly can beat a good compiler. And you don’t have to be a “master-level human assembler” to do it. You just have to think. Choose your battles. The compiler is unbeatable on its home turf. don’t try to optimize on the things it does well.
    But in many other cases, a human programmer knows things about the code that the compiler doesn’t. And that can make a huge difference. Using “restrict” as shown in this blog post is one example. The compiler can’t safely do this because it doesn’t know if it’ll break the program. The human programmer *does* know that this assumption is safe to make.

  32. June 7th, 2010 at 5:55 pm

    Anonymous says:

    this whole article could be written in five lines.

  33. June 7th, 2010 at 6:08 pm

    Anonymous says:

    Rather than blame developers for failing to compensate for problematic language features, why not use a language that doesn’t have those problems in the first place (excluding kernel devs, etc)? The simple fact is that pointers eliminate many kinds of automatic optimization. Fran Allen noted this decades ago. I think it’s time to stop apologizing for C.

  34. June 7th, 2010 at 6:26 pm

    Rick says:

    Use -fpmath=sse to use the SSE unit instead of x87 and -ftree-vectorize (might already be part of -O3) to generate vectorized SSE code.

  35. June 7th, 2010 at 11:30 pm

    Kelexel says:

    yeah… go ahead … optimize…

    I’m not saying that optimizing is never useful, quite the contrary. However from experience Optimizing has almost always (note.. almost) been harmful. It introduces bugs that are hard to fix because the code is harder to read, it makes the system much more fragile and looses a lot in resilience over external changes. Especially when the optimization is done through compiler tricks and exploiting obscure corner cases.

    Optimization such as this should be as isolated as possible, writing the code directly in assembly language would have avoided some of the pitfalls. Burying the code in a library that is properly unit tested and provide a proper interface to it so that the optimization does not spread beyond it`s original intended use would be even better.

    However, was it really necessary ? Sure we can come up with a thousand cases where optimization yielded sizeable benefits. But there are far more cases where optimization was done prematurely. No amount of optimization on a bubble sort will beat using a more efficient algorithm. But worst of it is the hidden cost which, with time becomes not so hidden.

    Many times I have seen corporations with systems many millions of lines of codes that are so hard to maintain because of the systematic optimization done. In fact I have seen optimization techniques that became laws, your code HAD to do it this way. The end result was that the systematic application of this without regard of the actual impact actually made the code run MUCH slower, I’m not speaking 0.6 seconds or 8x slower I’m speaking orders of magnitude slower.

    To make matters worst the code is not so convoluted and over optimized that it is nearly unmanageable. A simple feature that should take a few weeks to implement now takes over a year because of the fragility these optimization brought to the code base. Fix something here and five other things break…

    YES optimization is a good thing when used properly, so is goto, firearms, insecticide and land mines. However it is because they are so easy to mis-use that we recommend that they are not be used at all. Most (99%) people are not smart enough to use them properly and in doing it make life impossible for the rest of us.

    If you are to optimize then do it proper, roll up your sleeves and get the assembly code directly. but please by the love of good do not do it by tricking the compiler. If it is not smart enough for you then remove it for the bits you think you can do better and have fun but do not try to inject smart behaviour in stupid things it will come back to haunt you.

  36. June 8th, 2010 at 2:05 am

    Graycode says:

    For compilers without restrict support, I believe this would work:

    void naive_tempvar(float *in, float *out)
    {
    int i,j;
    for (i=0;i<LENGTH-4;i++)
    {
    float sum = 0.0f;
    for (j = 0;j<5;j++)
    sum += filter[j] * in[i+j];
    out[i] = sum;
    }
    }

    By fixing the location of out[i] as a temporary variable, the aliasing problem is removed, and the compiler should optimize that code into near-identical assembly.

    Please correct me if I'm wrong.

  37. June 8th, 2010 at 2:30 am

    42Bastian says:

    @Louis: Good article.

    @Drifter: Testing such optimizations stand-alone won’t give you valid numbers. At least with N == 5000. The loading of the exe alone takes too much time.

    @No-hand-optimization folks:
    Understanding where and when to optimize is IMHO essential for a good programmer. And since the article does not state directly optimizing for a 3GHz-PC you really should consider the vast number of programmers working on embedded systems, which are often restricted in speed _and_ memory.

    Sure: Writing assembly for todays IA32/IA64 cpus is overkill. One just cannot write code which is optimal for a Pentium 4 and a i7.
    But knowing how to help the compiler to do the best job for you is the trick.

    And BTW: Ever wondered why Thunderbird takes longer to start on a 3GHz machine than Netscape on a 200MHz computer ?

    I suggest, every application developer must be forced to program on 500MHz computers, it’ll help the users :-)

  38. June 8th, 2010 at 2:31 am

    Aldurs says:

    Nice article, it will be interesting to see your SSE-version.
    But, there’s always one, to truly optimize your code you should leave high-level code and rewrite the algorithm in assembly (but this you already knew). Trying to squeeze some extra oompf out of the code generated gives, most of the times, just a few percent — occasionally more. The drawback are that you tends to get bugged down in the generated code instead of seeing the whole algorithm and translating that.

    That said, it’s not a total waste of time scurrying through the output from the compiler — when hunting for speed bugs, as your example clearly shows.

  39. June 8th, 2010 at 2:36 am

    42Bastian says:

    @Graycode : Tested with PowerPC-gcc -O3: The stores are gone, but the restricted version is better unrolled using a lot multiply-add instructions. But compiling with size optimization your code gives the better result (size sometimes matters).

  40. June 8th, 2010 at 2:50 am

    Yves says:

    Louis,

    an interesting and relevant article for those involved in optimizations.

    A comment to Magice: my bread and butter is real-time image processing. I can be requested to handle, say, 4 Mpixels images 60 times per second. So my time scale is the millisecond. In such a context, optimization really matters :)

    Unfortunately for us, the behavior of modern processors becomes less and less predictable and repeatable. This slowly turns the art of programming to black magic…

  41. June 8th, 2010 at 3:06 am

    AlainT says:

    Hi,

    Interesting article but missing a warning, because on real cases (I mean professional work), the use of “restrict” will certainly lead to a nice big crash :-( when a new developper arrives and changes the parameters, without knowing that *in and *out should not be overlapping (or do you really expect him to check all the code before changing 3 lines, or knowing what the “restrict” keyword means ?)

    In fact I’ll even bet that the code writer himself will eventually break the code some months later because he would have forget about this optimization…

    In my opinion the temporary variable solution mentionned by Graycode is far better because it does not introduce unnecessary risks and is finally much more clear on what the processor is really doing.

  42. June 8th, 2010 at 3:16 am

    Jonathan Chayce Dickinson (Mr.) says:

    Read a lot of comments from people who go at great depths to say: just leave it up to the compiler. Bad idea. I don’t care if you don’t check the optimization into source control; stick around at work for another 1/2 hour and optimize a few of your methods.

    The reasoning is simple: as a programmer you should not trust a black box. You need to understand how the tooling you are using works; where the nuts and bolts are, how everything fits together all the way down to the metal.

    Doing anything less is giving the rest of us a bad name. These people are the type who don’t know when to use an enumerator and when to use an array. These are the type of people who think that the year 2000 will never happen.

  43. June 8th, 2010 at 4:18 am

    jcm says:

    Choosing algorithms and data structures wisely to avoid combinatorial explosion is the very best optimization of human time and safety. Micro-optimization is non-sense.

  44. June 8th, 2010 at 4:25 am

    Kia says:

    A nice article!

    I suspect the source of contention between those advocating and those not preferring micro-optimisation is due to the type of applications being developed. I do a lot of high performance computing and can assure you when you have to run heavy computation with long loops over multi-million element vectors and your application might be running hours on end, micro-optimisation can definitely be worth the effort. This of course means you would first thoroughly profile the application and have a good understanding of the bottlenecks before setting out to optimise.

  45. June 8th, 2010 at 4:26 am

    caf says:

    The example given in this article *isn’t* micro-optimisation. Micro-optimisation is doing things like changing q = x[a + b] * y[a + b]; into temp = a + b; q = x[temp] + y[temp];, or changing * 2 into << 1.

    Adding appropriate "restrict" qualifiers isn't any kind of overengineering, either; as you've noted, it's something that allows the compiler to optimise better, without affecting the readability or maintainability of the code. In my book, correct use of the "restrict" qualifier is just as valuable and praiseworthy as correct use of the "const" qualifier.

  46. June 8th, 2010 at 5:32 am

    Mark says:

    I suggest that you just have a poor compiler. Visual Studio 2010 has naive running on a LENGTH array in 14.1754 nanoseconds. Also, I really didn’t like how you didn’t give example inputs or other use cases, or just your general setup.

  47. June 8th, 2010 at 8:52 am

    dvader says:

    Good article, haven’t played with C/asm optimisation for a while. I have to do a lot of optimisation in my job, in C#, and most of my stuff is limited to algorithms. The problem is in maintenance. If I can’t get the monkeys to understand what I’ve done – it never gets used again. The problem is we all have to deal with marginal coders, and we can’t just blow them off – because they ARE the majority.
    But it was good to read about some real optimisation again! :)

  48. June 8th, 2010 at 9:49 am

    Matt says:

    Some pretty unfair comments in what predictably turned into a flame war (be it a fairly civil one so far). The fast majority of slow response times in all applications today are from one of 3 factors 1) Blocking on I/O inefficiently (especially networked) 2) Allocating memory inefficiently (especially string construction) 3) Context/Thread switching (inefficiently?)

    So my vote would be to teach all of the app developers in the world how to optimize those 3 things before we get into micro optimization religious wars. Also, when app developers don’t micro optimize their code, it’s usually not because they don’t take pride in their work, it’s more often because they have to balance features and quality against performance gains.

  49. June 8th, 2010 at 9:50 am

    Matt says:

    And by fast majority I meant vast majority. Limited editing abilities….

  50. June 8th, 2010 at 10:43 am

    Peter da Silva says:

    OK, the point of adding the restrict keyword is to get it to accumulate results in a temporary variable in the loop.

    Why don’t you just do that?

    void naive(float* in, float* out)
    {
    int i,j;
    for (i=0;i<LENGTH-4;i++)
    {
    float sum = 0.0f;
    for (j = 0;j<5;j++)
    sum += filter[j] * in[i+j];
    out[i] = sum;
    }
    }

    This not only lets the compiler know it doesn't need to worry about aliasing, it's more readable once the expressions (as they tend to) become more complex.

  51. June 8th, 2010 at 5:55 pm

    Jonty says:

    @jon and @jalf

    - Pipelining may not be so much an issue on lower end / embedded processors, where you would be looking exactly to make the kind of optimisations demonstrated in this article.

    Even the Atom, if I understand it correctly, will not particularly negatively affect your expected execution order/speed from it’s in-line pipeline.

  52. June 8th, 2010 at 10:03 pm

    ricardo saracino says:

    I like the explanation…

    btw i believe this was compiled with gcc and not a MS product

  53. June 9th, 2010 at 11:29 am

    Jason Cohen says:

    I’d like to see the same code with fixed-precision integers instead of floats — what’s the speed difference?

    I know FPUs are great these days, so maybe there’s no difference. Last time I was doing similar code (90′s) it was a huge no-no to do floating point because of the performance problems. I’m curious!

  54. June 11th, 2010 at 11:18 am

    shegget says:

    @AlainT – Kudos. I was reading this entire comment thread waiting for someone to mention exactly what you said. In a theoretical world of “Hey let’s see if I can make my code go faster / make the smallest PE possible / win the obfuscated C contest”, this is pretty neat. In the real world of someone else maintaining your code or, as you said, returning to your own code in several years with a different compiler/build environment and rusty knowledge, techniques that require specific programmer knowledge and specific compiler behavior have the potential to create some really painful and hard to find bugs.

  55. June 19th, 2010 at 5:29 pm

    Anonymous says:

    fdsdfsgfd

  56. June 28th, 2010 at 1:13 am

    IvenJ says:

    very helpful,useful!Sometimes ,it’s miserable that your codes are just running slowly, but u dont know why

  57. July 23rd, 2010 at 6:59 pm

    argv says:

    did you ever consider looking at gcc instead of looking at your code (whether C or assembly)?

    more specifically do you understand how flex and bison work? do you know what they are doing (in characteristic american “micro-detail”? that’s for you magice ;)

    gcc is a mess. it’s not clean code. and this makes it extremely difficult to fix things.

    and writing compilers is not easy.

    someone suggested reading a book on “good code”.
    perhaps you should be reading a book on compilers?

    programmers are not only lazy but, today, they are also ignorant. and therein lies the problem. history has been overlooked.

    knowing assembly is not enough to understand the problem. you need to understand the steps taken to get from C to assembly, and how they are implemented.

    and that’s a lot of work.

    note also: there are alternatives to gcc. but you need to investigate the “ancient” past to find them.

  58. August 19th, 2010 at 3:02 pm

    Nadav says:

    Great post. Good job.

  59. November 15th, 2010 at 11:34 am

    Aaron Foster says:

    Heya. This may have been pointed out already, but the fundamental point of your post seems to be that GCC isn’t doing the best job of optimizing, but GCC isn’t really an optimizing compiler at all, even though it has some optimization options. Supposedly Intel’s compiler is the gold standard for computation-heavy optimization (at least for their procs); this is why the Gentoo streetracers like to use ICC for wherever it’s possible. I have never heard of anyone using GCC if their goal is to churn out the fastest possible code.

    An interesting article, though.

  60. May 29th, 2011 at 10:17 pm

    tor says:

    The problem when discussing asm code improvement is always the same. Some people just don’t get the subtle difference between “never” and “almost never”.

    The truth is: we should *almost never* code in assembly for speed. It *almost never* makes any sense. I *almost never* go to the doctor, because I’m almost always in good health. Fighter pilots *almost never* need to press the eject button. That doesn’t mean they never should!

  61. October 11th, 2011 at 11:51 pm

    Questionnaire says:

    Enhanced my skills..very well written!

  62. November 6th, 2011 at 4:22 am

    iPhone says:

    Replica Cartier Ballon Bleu Automatic Men’s Watch W69012Z4
    Replica Cartier Ballon Bleu Automatic Men’s Watch W69012Z4
    larger image

    Units in Stock25

    Bookmark and Share $357.00 $235.00
    Save: 34% off

    International shipping cost $27.9
    Buy 2 units items or more, USA outside international shipping is free.
    Shipping to All Countries in the world.
    Cartier Ballon Bleu replicabirthday party invitations

  63. November 6th, 2011 at 6:58 am

    iPhone says:

    Geras Organic Olive Oil 750ml
    GERAS olive oil owes its unique quality to the particular conditions prevailing in Lesvos groves and comes from Kolovi and Adramitiani olives, the varieties flourishing in the bay of Geras. Rich in antioxidants, vitamins, beneficial polyphenols.
    €12.00 As low as: €10.00

    olive oil sprayPregnancy Forum

  64. November 7th, 2011 at 1:27 am

    iPhone says:

    affordable real estate agent websiteProfessional Recruitment For Candidates
    Our Realty solutions
    are “easy” and “affordable” for Real Estate Agents

    We make it easy for any agent to get onboard and focus on selling with our RealtyAgent Open Listings Management – FREE Real Estate Website.

    Call 1 (866) 967-0982, Talk to a Representative and Get Started

    { Visit demo.kylarealtysolutions.com }

  65. November 8th, 2011 at 1:02 am

    iPhone says:

    About Top Gun Replica Watches

    We have thousands of cheap replica watches for sale, they are best replica watches, we only keep high quality replica watches on our website, so we don’t have many hot models which on other website but low quality, our luxury top quality high end replica watches just exact like what your see on our pictures, you can buy any mens and ladies watches replicas from us, you will get pefect wrist watches, all them will make you happy.

    homehome

  66. December 31st, 2011 at 2:58 am

    noway says:

    remote controlled toysFamosos
    Descubra com qual celebridade você se parece

    Você sempre se achou parecido com alguma celebridade, mas nenhum dos seus amigos ou familiares têm a mesma percepção? Então mostre para eles com quem você se parece com o…

  67. December 31st, 2011 at 11:58 am

    noway says:

    alarm systems for apartmentschatten
    10 Home Security Systems Benefits To Keep In Mind

    Home security systems are the top most requirements for safety purposes. There is a boom in this industry that has allowed home security companies to flourish and it gives many options to the customers to choose from.

  68. January 1st, 2012 at 9:43 pm

    noway says:

    yachtcrewκατασκευή ιστοσελίδας
    Maritimworld.com is a job portal that provides something different. The reason is quite simple.

    Developed and maintained by ‘First Solutions’ a Norwegian Company.

  69. January 1st, 2012 at 9:47 pm

    noway says:

    Welcome to Banner Stand Pros . . .
    The Superstore for Banner Stands!

    You’ve arrived at the best spot on the internet for banner stands. Here you’ll find the largest selection of retractable banner stands (also called roll up or pull up stands), portable models, which are the non-retractable styles, as well as models for trade shows, retail or outdoor use, accessories, replacement parts, shipping cases and more. Our huge selection means you’re sure to find exactly the right display or accessory for your needs, and our large volume means we can offer the best prices in the industry.
    banner display standsAustralia Shoes

  70. January 1st, 2012 at 10:29 pm

    noway says:

    Please join us in enjoying the luxurious lifestyle that Naples, Florida has to offer—–beautiful tropical weather with warm breezes off the Gulf of Mexico. Fine dining, world class shopping, championship golf and boating are a part of the wonderful lifestyle that Naples has to offer.
    Quail WestRYA Course

  71. January 2nd, 2012 at 9:54 pm

    noway says:

    bedsresort Fraser Island
    Bed and Mattresses for a comfortable night sleep from Absolute Beds London

    The best deal bedroom furniture store where you can find bespoke beds and mattresses exclusive beds frames at best and cheap prices in London and UK.

  72. January 2nd, 2012 at 9:59 pm

    noway says:

    celebrity newslocation de salle lyon
    Is Sonic Youth breaking up? If so, we will need to take a break from gossiping for a few hours to weep and smash furniture.

  73. January 3rd, 2012 at 2:02 am

    noway says:

    Air Conditioning Service NYC

    Five Borough AC Is The Air Conditioning Experts For Residential And Commercial Units. We Are One Of The Leading HVAC Service Providers In NYC.Our Respond To Your Air Conditioning Repair Needs Will Be Fast And Professional. Our Goal Is To Make Sure Your Air Conditioner Will Preform At It Best. Our Air Conditioner Repair Mechanics Are EPA Certified And Provide Top Customer Service.
    air conditioners nycNAIL

  74. January 3rd, 2012 at 8:13 am

    noway says:

    Featured Product Freaky Cousin Outdoor Lime
    $145.00

    The Freaky Cousin bean bag’s vibrant colour and unique shape perfectly compliments any lounge room, childs play area or bedroom. DIMENSIONS 105 x 77 x 55 Recommended beans 220 ltrs Polyester 420 D with PVC Coating Be careful with the shocking color and shape effect of the Freaky Cousin.
    outdoor bean bagswedding shower invitation

  75. January 11th, 2012 at 11:17 am

    noway says:

    doormats with sayingsst lucia holidays
    Serve the Lord MatMates™
    This Serve the Lord MatMates™ pattern welcome mat

  76. January 13th, 2012 at 7:44 am

    noway says:

    Saving money on ink cartridges is an easy way to lower expenses
    We’re a Proud Member of the BBB ASAP Inkjets offers discount inkjet cartridges and laser toner for Epson, Canon, HP, Lexmark, Brother and many other printers at enormous savings.
    inkjet print cartridgesMusic for Documentaries

  77. February 1st, 2012 at 9:55 am

    noway says:

    Epson M2400 Tonerpower tab files
    Compatible Epson M2400 Toner

    Our compatible Epson M2400 toner fits the VOSA supplied MOT printers and gives high-quality print for a lot less cost than the Epson originals.

  78. February 1st, 2012 at 10:49 am

    noway says:

    holidays to antiguadata cabling engineer
    Antigua Weather
    As one would expect, Antigua boasts a very generous climate all year round, with only a slight dip in temperature during the winter months.

  79. February 3rd, 2012 at 9:13 am

    noway says:

    For many years, we have been dedicated to finding the most talented, up-to-date and qualified dentists to meet your dental needs. Every dentist and office hosted on 18004SMILES.com has met these qualifications.
    dentist in san diegoизготовление пластмассовых изделий

  80. February 4th, 2012 at 11:38 am

    noway says:

    Monkey Taxi (1 played)
    Dress My Delicious (1 played)
    vibratorben10 games

  81. February 8th, 2012 at 9:06 am

    noway says:

    como hacer un guionclaw foot bathtub faucets
    This stylish free standing gooseneck faucet combo set will make a wonderful addition to your vintage bathroom. This classic set comes with everything you need to deliver water to your
    $642.00

  82. February 8th, 2012 at 9:26 am

    noway says:

    Antigua All Inclusive Holidays and Cheap Deals
    Looking for that perfect get away from it all holiday? Our holidays to Antigua include getaway holidays and cheap deals to the Indian Ocean Island of Antigua.

    cursos guione-cigaret med nikotin

  83. February 9th, 2012 at 4:40 am

    key says:

    Because again comply with the slim budget, air was chosen relatively inexpensive resort, which unfortunately was away from the others, but still closer to the sea.
    климатици

  84. February 10th, 2012 at 11:03 am

    noway says:

    IT Support in London

    IT support for London businesses
    We aim to understand your business
    Local knowledge of services
    One point of call for your IT needs
    Specialists in small to medium size companies

    business it support londonnon profit fundraising

  85. February 11th, 2012 at 12:04 pm

    noway says:

    group buying in south africaResidential Extended Care Drug Rehab Los Angeles
    Group-Buying

    Welcome to our personal social group buying platform of services, trades, products and packages.

  86. February 11th, 2012 at 12:44 pm

    noway says:

    Rumahrouwverwerking
    Century 21 Broker Properti Rumah Apartement Tanah Ruko Dijual Disewa Beli Cari di Indonesia

  87. February 15th, 2012 at 3:01 am

    noway says:

    dubai holidaypijn
    Dubai Holidays & Hotels
    Dubai Holidays and Tourist Information

  88. February 15th, 2012 at 3:18 am

    noway says:

    Suffolk County Truck Accident Attorney

    Truck Accident Lawyers serving Suffolk County NY. If you are looking for Truck Accident Lawyers in Suffolk County, Foley Griffin, LLP can help.

    Tractor trailer, and other commercial truck accidents can be devastating for victims and their families, but the truck accident attorneys at Foley Griffin, LLP have been helping victims of truck accidents in Suffolk County receive the compensation they deserve for years.
    bottled alkaline waterTruck Accident Attorney Suffolk county ny

  89. February 19th, 2012 at 2:08 am

    noway says:

    Thank you for your interest in the Flying Duchess
    Please fill out the form below to set up your complementary in home consultation.
    Toronto Dog WalkingLiferay

  90. February 19th, 2012 at 2:17 am

    noway says:

    This course targets persons with no previous knowledge in Spanish who would nevertheless like to talk and interact with the local community. In this Spanish course you will study two weeks in Quito, one week in Baños (Ecuador) and go back to quito to work two weeks in a social aid programm. While working with children you will get the opportunity to practice the language
    Volunteering in Ecuador and learning Spanishchristmas scavenger hunt

  91. February 19th, 2012 at 10:29 pm

    noway says:

    quarzhandschuhe günstig kaufenCredit Card Review
    Füllung im Finger und Knöchelbereich mit Metallstaubfüllung. 4 Seperat abgenähte Kammern für gleichmäßige Verteilung und deutlich bessere Beweglichkeit der Finger. Schlagkraft bei diesem Handschuh ist extrem erhöht.

  92. February 20th, 2012 at 9:55 pm

    noway says:

    holiday in barbadosGalapagos Travel

    Barbados All Inclusive Holidays and Cheap Deals
    Looking for that perfect get away from it all holiday? Our holidays to Barbados include getaway holidays and cheap deals to the Caribbean Island of Barbados.

  93. February 21st, 2012 at 9:38 am

    key says:

    It did not matter that you are a bar to the middle of nowhere, with a total freak because he could walk on two legs and was splattered on the asphalt somewhere. влагоуловители

  94. February 22nd, 2012 at 8:08 am

    noway says:

    Choose from 15 easyHotels and remember by booking earlier, you will usually find the cheapest prices!
    hotels in centre of edinburghFree Car Insurance Rates

  95. February 22nd, 2012 at 10:04 am

    key says:

    And continue to work with the vacuum, but absolutely nothing was received and the sink sat just as clogged as well come. Finally Drain decided to apply less force to increase pressure and ponatisna greater the vacuum, almost leaning against the sink. отпушване на канали

  96. February 22nd, 2012 at 9:46 pm

    noway says:

    Cognitive behavioral therapy (CBT) is a rather complicated term for a very simple thing – as sadly, terms in psychology often are.

    In a nutshell CBT therapy is a bit like having a wise old friend who, whenever you go to see them, can always manage to get you to look at things from a different angle. Or think of something you hadn’t previously thought of.
    cbt therapyleather and canvas handbag

  97. February 24th, 2012 at 3:06 am

    noway says:

    yacht charter balearic islandsbalers
    The choice is wide, ranging from the most modern and luxurious yachts to traditional gulets, from catamarans to classic sail boats.

  98. February 24th, 2012 at 3:12 am

    noway says:

    cute juniors clothesWoodlands Breast Augmentation
    En skraplott är en lottsedel som är täckt med ett ogenomskinlig skikt som skall skrapas bort för att avslöja om man vunnit. Vanligtvis avgörs vinst eller förlust av om man har en viss kombination av symboler eller ej. Man vinner olika mycket beroende på vilken kombination man har – ju mer osannolik en kombination är desto större är vinstsumman

  99. February 24th, 2012 at 10:33 am

    noway says:

    Enjoy Excellence Service And Exclusive Pricing With Atlanta’s Top Restaurants And Catering Companies!

    Free Consultations Provided By Atlanta’s Top Restaurant Owners.
    Bundled Deals On Photographers, Live Music, And More!

    Solar Panels DundeeCatering Companies in Atlanta

  100. February 24th, 2012 at 11:11 am

    noway says:

    يورو دولارhotels in Sikkim
    Superview Himalchuli , Gangtok
    Gangtok and all places to visit is superb the superview himachuli is looking beautiful from infront

  101. February 26th, 2012 at 9:14 pm

    noway says:

    active directory recoverySmall Business Web Designer
    Blackbird recovery for Active Directory

    The only continuous recovery option for active directory, with unmatched change control and easily roll back to any previous state

  102. February 26th, 2012 at 10:26 pm

    noway says:

    holiday to barbadosPens, Montegrappa pens

    The weather in Barbados is split into two distinct seasons. The wet season lasts from June to November, with rainfall of between 40 and 90 inches during that period.

  103. February 27th, 2012 at 1:58 am

    noway says:

    homehome

    7010162
    Gel Polish Gul

    DKK 139,00

    Gel Polish Gul, NYHED!!

    Vejledning :
    Rens dine negle, buf let. Påfør som en alm lak, hærd i 2 min i uv lampe.
    Kan opløses i Acryl remover.

  104. February 27th, 2012 at 6:19 am

    key says:

    The first component of the price / quality ratio is clear: maximum sustainable windows of any changes in climate over the years, this is a quality joinery. интериорни врати

  105. February 27th, 2012 at 10:07 am

    noway says:

    designer homes brisbaneaddress stamps
    Custom Design Homes

    AAD Design are award winning industry leaders in custom home design. AAD Design have won over 40 regional, state and national design awards.

  106. February 28th, 2012 at 5:08 am

    noway says:

    homehome
    One of the most painful aspects of coping with infidelity within a relationship can be the loss of your identity. For many of us, a large part of our identity is connected to our relationship.

  107. February 28th, 2012 at 5:11 am

    noway says:

    homehome

    Welcome to GoToTheBeach Real Estate & Vacation Rentals
    Specializing in 30A Real Estate & Seagrove Beach Real Estate

  108. February 29th, 2012 at 11:37 pm

    MrNguyen says:

    One of the most painful aspects of coping with infidelity within a relationship can be the loss of your identity. For many of us, a large part of our identity is connected to our relationship.

  109. March 1st, 2012 at 1:39 am

    MrNguyen says:

    Home Business WebsiteThe weather in Barbados is split into two distinct seasons. The wet season lasts from June to November, with rainfall of between 40 and 90 inches during that period.

  110. March 1st, 2012 at 9:36 pm

    noway says:

    With a choice of
    Low net/bulk airfares
    Non-capped commissions
    for First, Business & Economy class published air faires to thousands of destinations worldwide.
    business class ticketAccident & Sickness Insurance

  111. March 2nd, 2012 at 8:00 am

    noway says:

    homehome
    Coby LEDTV1926 19″ LED-LCD TV
    COBY 19IN LED HD TV 720P ATSC/QAM/NTSC TUNER HDMI
    ATSC – NTSC – 1366 x 768 – 1 x HDMI
    In Stock
    Price: $189.99
    Retail: $179.99
    You Save: ($10.00)

  112. March 2nd, 2012 at 10:18 pm

    noway says:

    franchise opportunitiesdyson hand dryers
    Best Small Business Franchise Opportunities

    Discover the Best Franchise and Small Business Opportunities With Management 2000

  113. March 2nd, 2012 at 11:33 pm

    noway says:

    iPhone App Developmentseo
    What can RCM Blitz™ do for You?
    Reduce unit cost of product by lowering maintenance costs and improving Overall Equipment Effectiveness (OEE)

  114. March 3rd, 2012 at 12:39 am

    noway says:

    ETERNAL LIFEAssay Development

    A close study of the Bible will reveal mystery after mystery. WHY? Because the Bible was chiefly written to inform mankind about eternal life and how to get such (2 Tim. 3:15).

  115. March 4th, 2012 at 8:54 pm

    noway says:

    Mijn winkelwagen

    U hebt niets in uw winkelwagen.
    Producten vergelijken

    Geen producten geselecteerd.

    raamfolie kopenbreitling watches replica

  116. March 5th, 2012 at 8:16 am

    key says:

    Court is an independent body which assists the EP and the Council of Ministers in exercising their powers to oversee the implementation of the Community budget.
    Димитър Веселинов Калинов

  117. March 5th, 2012 at 8:19 am

    key says:

    For this purpose, it examines reports on revenue and expenditure of all bodies. Its representatives regularly participate in meetings of parliamentary committees on budget and supervise its execution. Димитър Веселинов Калинов

  118. March 8th, 2012 at 3:26 am

    key says:

    The ideas of federalism in Western Europe have a centuries-old historical tradition. In the period after World War the idea of ​​creating a United States of Europe has organized a powerful pan-European movement. електротехник

  119. March 9th, 2012 at 3:06 am

    key says:

    For example, can benefit from the EC budget provided for the development of vocational education for youth projects, projects to promote predprioemachestvoto and others. електро услуги

  120. March 12th, 2012 at 3:47 am

    key says:

    Have a considerably larger amount during the year during the monsoon often cause major flooding. дърводелски услуги

  121. March 13th, 2012 at 8:55 am

    key says:

    Do not allow the participation of private entrepreneurs in the following sectors: atomic energy, railways and products opomenati the special list of Department of Atomic Energy.
    смяна на щрангове

  122. March 14th, 2012 at 7:37 am

    key says:

    Government building strategy that provides healing to 85% of patients and detection of the disease at an early stage. вик ремонти

  123. March 16th, 2012 at 8:43 am

    noway says:

    non profit fundraiserRound Multicolor Toe Ballet Flat
    Shop for amazing savings. Support causes that make a difference.

    BiddingForGood is the intersection of charity and shopping – an auction site that allows you to shop while supporting causes that make a difference. Get great deals and experience feel-good shopping at its finest! Sign up to get the weekly scoop on secret steals and deals in your inbox.

  124. March 20th, 2012 at 9:41 am

    key says:

    Industrial structure of the state gradually diversify by opening proceedings such as chemical, oil refining, fertilizers, electronics manufacturing. The largest industrial centers are Dzhhadagiya, Vagra and Saul. бойлери

  125. March 28th, 2012 at 6:00 am

    key says:

    This would have disastrous consequences given that, according to the UN today 200 million people in this country are fed poor that directly threaten their lives. мобилни климатици

  126. April 10th, 2012 at 7:25 pm

    camisetas says:

    I am very enjoyed for this site. Its year informative topic. It helps me very much to solve Some Problems. It’s so fantastic and Opportunity are working style so speedy. I Think It May Be help all of you. Thanks a lot for this beauty Enjoying article with me. I am appreciating it very much! Looking Forward to Another Great article. Good luck to the Author! All the best.

  127. April 17th, 2012 at 9:03 am

    key says:

    Awning with Drop shoulder is suitable for installation of storefront κλιματιστικά located on lower floor.

  128. May 16th, 2012 at 7:23 am

    key says:

    To care for the garden, lawns, fountains and garden lights дограма автоклиматици

Leave a Reply


new server, if you see badness, please email me!

Need a new job?