<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Computational performance &#8212; a beginner&#8217;s case study</title>
	<atom:link href="http://lbrandy.com/blog/2009/07/computational-performance-a-beginners-case-study/feed/" rel="self" type="application/rss+xml" />
	<link>http://lbrandy.com/blog/2009/07/computational-performance-a-beginners-case-study/</link>
	<description>{ on programming and the internets, every monday }</description>
	<lastBuildDate>Thu, 17 May 2012 07:59:48 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
	<item>
		<title>By: key</title>
		<link>http://lbrandy.com/blog/2009/07/computational-performance-a-beginners-case-study/comment-page-1/#comment-29459</link>
		<dc:creator>key</dc:creator>
		<pubDate>Thu, 17 May 2012 07:48:16 +0000</pubDate>
		<guid isPermaLink="false">http://lbrandy.com/blog/?p=837#comment-29459</guid>
		<description>Used glass can be thick from 6 to 50 mm, according to &lt;a href=&quot;http://xn--80aafybakikq7bg7c.org&quot; rel=&quot;nofollow&quot;&gt; зареждане на автоклиматици &lt;/a&gt;the needs of sound &lt;a href=&quot;http://aluminievadograma.com/&quot; rel=&quot;nofollow&quot;&gt; дограма &lt;/a&gt;insulation of the building.</description>
		<content:encoded><![CDATA[<p>Used glass can be thick from 6 to 50 mm, according to <a href="http://xn--80aafybakikq7bg7c.org" rel="nofollow"> зареждане на автоклиматици </a>the needs of sound <a href="http://aluminievadograma.com/" rel="nofollow"> дограма </a>insulation of the building.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: key</title>
		<link>http://lbrandy.com/blog/2009/07/computational-performance-a-beginners-case-study/comment-page-1/#comment-29396</link>
		<dc:creator>key</dc:creator>
		<pubDate>Wed, 18 Apr 2012 08:53:55 +0000</pubDate>
		<guid isPermaLink="false">http://lbrandy.com/blog/?p=837#comment-29396</guid>
		<description>Evaluation by a group of managers, colleagues and clients are increasingly leading companies come to the conclusion that the most appropriate feedback is that carrying a 360 degrees, &lt;a href=&quot;http://xn--hxajxbacedk6ejc.net&quot; rel=&quot;nofollow&quot;&gt; κλιματιστικά &lt;/a&gt; ie feedback from multiple sources that reflect multiple perspectives, especially for managerial personnel and staff appreciated by management.</description>
		<content:encoded><![CDATA[<p>Evaluation by a group of managers, colleagues and clients are increasingly leading companies come to the conclusion that the most appropriate feedback is that carrying a 360 degrees, <a href="http://xn--hxajxbacedk6ejc.net" rel="nofollow"> κλιματιστικά </a> ie feedback from multiple sources that reflect multiple perspectives, especially for managerial personnel and staff appreciated by management.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ilmari Heikkinen</title>
		<link>http://lbrandy.com/blog/2009/07/computational-performance-a-beginners-case-study/comment-page-1/#comment-26245</link>
		<dc:creator>Ilmari Heikkinen</dc:creator>
		<pubDate>Wed, 24 Feb 2010 18:45:02 +0000</pubDate>
		<guid isPermaLink="false">http://lbrandy.com/blog/?p=837#comment-26245</guid>
		<description>Looking at the ASM output for gcc -O3 -msse2 I noticed that horiz_add() and fast() use the SIMD 4-float vector instructions already (movaps, addps and friends.) And there&#039;s some funky business going on as the fast() and a manual SSE version written with SSE intrinsics are slower than horiz_add() on my Pentium M.</description>
		<content:encoded><![CDATA[<p>Looking at the ASM output for gcc -O3 -msse2 I noticed that horiz_add() and fast() use the SIMD 4-float vector instructions already (movaps, addps and friends.) And there&#8217;s some funky business going on as the fast() and a manual SSE version written with SSE intrinsics are slower than horiz_add() on my Pentium M.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Casey</title>
		<link>http://lbrandy.com/blog/2009/07/computational-performance-a-beginners-case-study/comment-page-1/#comment-26243</link>
		<dc:creator>Casey</dc:creator>
		<pubDate>Wed, 24 Feb 2010 17:37:23 +0000</pubDate>
		<guid isPermaLink="false">http://lbrandy.com/blog/?p=837#comment-26243</guid>
		<description>Your conclusion also depends on the language you&#039;re working in if I recall my High Performance Computing book correctly.  I don&#039;t see it specified, but I believe you&#039;re using C or C++.  Fortran, I think, would be faster if you go vertical instead of horizontal.</description>
		<content:encoded><![CDATA[<p>Your conclusion also depends on the language you&#8217;re working in if I recall my High Performance Computing book correctly.  I don&#8217;t see it specified, but I believe you&#8217;re using C or C++.  Fortran, I think, would be faster if you go vertical instead of horizontal.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Evil teach</title>
		<link>http://lbrandy.com/blog/2009/07/computational-performance-a-beginners-case-study/comment-page-1/#comment-26237</link>
		<dc:creator>Evil teach</dc:creator>
		<pubDate>Wed, 24 Feb 2010 15:14:21 +0000</pubDate>
		<guid isPermaLink="false">http://lbrandy.com/blog/?p=837#comment-26237</guid>
		<description>You should also look into the effect of page faulting on the problem.

If your arrays are larger than your working set, that some of your memory can be faulted in/out.  The resulting disk io can add significant costs to your result.</description>
		<content:encoded><![CDATA[<p>You should also look into the effect of page faulting on the problem.</p>
<p>If your arrays are larger than your working set, that some of your memory can be faulted in/out.  The resulting disk io can add significant costs to your result.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tom Finnigan</title>
		<link>http://lbrandy.com/blog/2009/07/computational-performance-a-beginners-case-study/comment-page-1/#comment-14705</link>
		<dc:creator>Tom Finnigan</dc:creator>
		<pubDate>Tue, 14 Jul 2009 20:29:21 +0000</pubDate>
		<guid isPermaLink="false">http://lbrandy.com/blog/?p=837#comment-14705</guid>
		<description>One big takeaway from all this is that you need a good profiler.  

There are so many levels of optimization (compiler, OS, CPU, other hardware) that it&#039;s very hard to get enough intuition to predict performance just from the code.  To get the best performance, you need to experiment with different methods, and choose the one that performs the best.  But that takes a lot of time, and to justify that time investment you need to know that it&#039;s the bottle neck of your program.  So, you need a good profiler.

The other thing I wanted to mention is that when I develop on windows I miss the OSX performance tools.  Shark is great, but so are the libraries, like the Accelerate framework.  This example would be one function call, and the framework uses SIMD and threading as appropriate.  Amazing to work with.</description>
		<content:encoded><![CDATA[<p>One big takeaway from all this is that you need a good profiler.  </p>
<p>There are so many levels of optimization (compiler, OS, CPU, other hardware) that it&#8217;s very hard to get enough intuition to predict performance just from the code.  To get the best performance, you need to experiment with different methods, and choose the one that performs the best.  But that takes a lot of time, and to justify that time investment you need to know that it&#8217;s the bottle neck of your program.  So, you need a good profiler.</p>
<p>The other thing I wanted to mention is that when I develop on windows I miss the OSX performance tools.  Shark is great, but so are the libraries, like the Accelerate framework.  This example would be one function call, and the framework uses SIMD and threading as appropriate.  Amazing to work with.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: louis</title>
		<link>http://lbrandy.com/blog/2009/07/computational-performance-a-beginners-case-study/comment-page-1/#comment-13816</link>
		<dc:creator>louis</dc:creator>
		<pubDate>Wed, 08 Jul 2009 13:59:54 +0000</pubDate>
		<guid isPermaLink="false">http://lbrandy.com/blog/?p=837#comment-13816</guid>
		<description>I merged those changes. Thanks guys.</description>
		<content:encoded><![CDATA[<p>I merged those changes. Thanks guys.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

