<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Computational performance &#8212; a beginner&#8217;s case study</title>
	<atom:link href="http://lbrandy.com/blog/2009/07/computational-performance-a-beginners-case-study/feed/" rel="self" type="application/rss+xml" />
	<link>http://lbrandy.com/blog/2009/07/computational-performance-a-beginners-case-study/</link>
	<description>{ on programming and the internets, every monday }</description>
	<lastBuildDate>Fri, 03 Feb 2012 14:14:15 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
	<item>
		<title>By: Ilmari Heikkinen</title>
		<link>http://lbrandy.com/blog/2009/07/computational-performance-a-beginners-case-study/comment-page-1/#comment-26245</link>
		<dc:creator>Ilmari Heikkinen</dc:creator>
		<pubDate>Wed, 24 Feb 2010 18:45:02 +0000</pubDate>
		<guid isPermaLink="false">http://lbrandy.com/blog/?p=837#comment-26245</guid>
		<description>Looking at the ASM output for gcc -O3 -msse2 I noticed that horiz_add() and fast() use the SIMD 4-float vector instructions already (movaps, addps and friends.) And there&#039;s some funky business going on as the fast() and a manual SSE version written with SSE intrinsics are slower than horiz_add() on my Pentium M.</description>
		<content:encoded><![CDATA[<p>Looking at the ASM output for gcc -O3 -msse2 I noticed that horiz_add() and fast() use the SIMD 4-float vector instructions already (movaps, addps and friends.) And there&#8217;s some funky business going on as the fast() and a manual SSE version written with SSE intrinsics are slower than horiz_add() on my Pentium M.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Casey</title>
		<link>http://lbrandy.com/blog/2009/07/computational-performance-a-beginners-case-study/comment-page-1/#comment-26243</link>
		<dc:creator>Casey</dc:creator>
		<pubDate>Wed, 24 Feb 2010 17:37:23 +0000</pubDate>
		<guid isPermaLink="false">http://lbrandy.com/blog/?p=837#comment-26243</guid>
		<description>Your conclusion also depends on the language you&#039;re working in if I recall my High Performance Computing book correctly.  I don&#039;t see it specified, but I believe you&#039;re using C or C++.  Fortran, I think, would be faster if you go vertical instead of horizontal.</description>
		<content:encoded><![CDATA[<p>Your conclusion also depends on the language you&#8217;re working in if I recall my High Performance Computing book correctly.  I don&#8217;t see it specified, but I believe you&#8217;re using C or C++.  Fortran, I think, would be faster if you go vertical instead of horizontal.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Evil teach</title>
		<link>http://lbrandy.com/blog/2009/07/computational-performance-a-beginners-case-study/comment-page-1/#comment-26237</link>
		<dc:creator>Evil teach</dc:creator>
		<pubDate>Wed, 24 Feb 2010 15:14:21 +0000</pubDate>
		<guid isPermaLink="false">http://lbrandy.com/blog/?p=837#comment-26237</guid>
		<description>You should also look into the effect of page faulting on the problem.

If your arrays are larger than your working set, that some of your memory can be faulted in/out.  The resulting disk io can add significant costs to your result.</description>
		<content:encoded><![CDATA[<p>You should also look into the effect of page faulting on the problem.</p>
<p>If your arrays are larger than your working set, that some of your memory can be faulted in/out.  The resulting disk io can add significant costs to your result.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tom Finnigan</title>
		<link>http://lbrandy.com/blog/2009/07/computational-performance-a-beginners-case-study/comment-page-1/#comment-14705</link>
		<dc:creator>Tom Finnigan</dc:creator>
		<pubDate>Tue, 14 Jul 2009 20:29:21 +0000</pubDate>
		<guid isPermaLink="false">http://lbrandy.com/blog/?p=837#comment-14705</guid>
		<description>One big takeaway from all this is that you need a good profiler.  

There are so many levels of optimization (compiler, OS, CPU, other hardware) that it&#039;s very hard to get enough intuition to predict performance just from the code.  To get the best performance, you need to experiment with different methods, and choose the one that performs the best.  But that takes a lot of time, and to justify that time investment you need to know that it&#039;s the bottle neck of your program.  So, you need a good profiler.

The other thing I wanted to mention is that when I develop on windows I miss the OSX performance tools.  Shark is great, but so are the libraries, like the Accelerate framework.  This example would be one function call, and the framework uses SIMD and threading as appropriate.  Amazing to work with.</description>
		<content:encoded><![CDATA[<p>One big takeaway from all this is that you need a good profiler.  </p>
<p>There are so many levels of optimization (compiler, OS, CPU, other hardware) that it&#8217;s very hard to get enough intuition to predict performance just from the code.  To get the best performance, you need to experiment with different methods, and choose the one that performs the best.  But that takes a lot of time, and to justify that time investment you need to know that it&#8217;s the bottle neck of your program.  So, you need a good profiler.</p>
<p>The other thing I wanted to mention is that when I develop on windows I miss the OSX performance tools.  Shark is great, but so are the libraries, like the Accelerate framework.  This example would be one function call, and the framework uses SIMD and threading as appropriate.  Amazing to work with.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: louis</title>
		<link>http://lbrandy.com/blog/2009/07/computational-performance-a-beginners-case-study/comment-page-1/#comment-13816</link>
		<dc:creator>louis</dc:creator>
		<pubDate>Wed, 08 Jul 2009 13:59:54 +0000</pubDate>
		<guid isPermaLink="false">http://lbrandy.com/blog/?p=837#comment-13816</guid>
		<description>I merged those changes. Thanks guys.</description>
		<content:encoded><![CDATA[<p>I merged those changes. Thanks guys.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andy Kish</title>
		<link>http://lbrandy.com/blog/2009/07/computational-performance-a-beginners-case-study/comment-page-1/#comment-13786</link>
		<dc:creator>Andy Kish</dc:creator>
		<pubDate>Wed, 08 Jul 2009 06:36:54 +0000</pubDate>
		<guid isPermaLink="false">http://lbrandy.com/blog/?p=837#comment-13786</guid>
		<description>Naixn: The weird output is fixed in my github fork: http://github.com/Kobold/simple-optimization-test/tree/master

Louis: Check your fork queue. ;-)</description>
		<content:encoded><![CDATA[<p>Naixn: The weird output is fixed in my github fork: <a href="http://github.com/Kobold/simple-optimization-test/tree/master" rel="nofollow">http://github.com/Kobold/simple-optimization-test/tree/master</a></p>
<p>Louis: Check your fork queue. <img src='http://lbrandy.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: anonymous</title>
		<link>http://lbrandy.com/blog/2009/07/computational-performance-a-beginners-case-study/comment-page-1/#comment-13704</link>
		<dc:creator>anonymous</dc:creator>
		<pubDate>Tue, 07 Jul 2009 11:19:19 +0000</pubDate>
		<guid isPermaLink="false">http://lbrandy.com/blog/?p=837#comment-13704</guid>
		<description>What about interlacing the arrays?</description>
		<content:encoded><![CDATA[<p>What about interlacing the arrays?</p>
]]></content:encoded>
	</item>
</channel>
</rss>

