<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>The Haskell Unlines</title>
	<atom:link href="http://unlines.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://unlines.wordpress.com</link>
	<description></description>
	<lastBuildDate>Sun, 14 Nov 2010 17:35:10 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='unlines.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>The Haskell Unlines</title>
		<link>http://unlines.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://unlines.wordpress.com/osd.xml" title="The Haskell Unlines" />
	<atom:link rel='hub' href='http://unlines.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Generics for small, fixed-size vectors</title>
		<link>http://unlines.wordpress.com/2010/11/15/generics-for-small-fixed-size-vectors/</link>
		<comments>http://unlines.wordpress.com/2010/11/15/generics-for-small-fixed-size-vectors/#comments</comments>
		<pubDate>Sun, 14 Nov 2010 14:53:46 +0000</pubDate>
		<dc:creator>Roman Leshchinskiy</dc:creator>
				<category><![CDATA[Haskell]]></category>

		<guid isPermaLink="false">http://unlines.wordpress.com/?p=355</guid>
		<description><![CDATA[Recently, I decided to clean up and release a small library which I hacked together several months ago and then all but forgot about. I find it quite amusing; perhaps you will, too. The library implements a framework for computing with small, fixed-size vectors such as complex numbers or coordinates. My goal was to be [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unlines.wordpress.com&amp;blog=9686756&amp;post=355&amp;subd=unlines&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Recently, I decided to clean up and release a small library which I hacked together several months ago and then all but forgot about. I find it quite amusing; perhaps you will, too.</p>
<p>The library implements a framework for computing with small, fixed-size vectors such as complex numbers or coordinates. My goal was to be as generic and efficient as possible. In particular, it should be easy to generically define common functions such as dot product or magnitude for vectors of arbitrary arity and to add new vector types and operations. Equally importantly, there shouldn&#8217;t be any run-time overhead &#8211; all operations should be as fast as if they were written by hand.</p>
<p><span id="more-355"></span>There are a number of nice libraries on Hackage which provide all this to varying degrees. However, I thought it would be fun to write my own and perhaps I could come up with a design that is minimal in some sense. In the end, I&#8217;m not sure about minimal but I sure did have quite a bit of fun.</p>
<p>So what is a small, fixed-size vector? Essentially, it should be completely defined by two operations:</p>
<pre>construct :: a -&gt; ... -&gt; a -&gt; v a
inspect :: v a -&gt; (a -&gt; ... -&gt; a -&gt; b) -&gt; b</pre>
<p>That is, we can construct a vector from a bunch of coordinates and inspect it with a function that takes a bunch of coordinates and produces some result. The number of coordinates is the arity of the vector and is known at compile time. Ideally, the framework should work for all types that define these two operations and nothing else. The question is: can we just put them in a type class and implement everything else on top of it?</p>
<p>The two signatures suggest (unsurprisingly) that functions of type <tt>a -&gt; ... -&gt; a -&gt; b</tt> play a rather central role here: <tt>construct</tt> is such a function and <tt>inspect</tt> takes such a function as an argument. Nowadays, it is fairly straightforward to encode them in the type system.  First, we define type-level natural numbers which represent the arity:</p>
<pre>data Z     -- zero
data S n   -- successor of n</pre>
<p>Now, we can define such functions as a type family parametrised by the arity <tt>n</tt>, the type of arguments <tt>a</tt> and the result type <tt>b</tt>:</p>
<pre>type family Fn n a b
type instance Fn Z a b = b
type instance Fn (S n) a b = a -&gt; Fn n a b</pre>
<p>For example, functions or arity 2 are represented by <tt>Fn (S (S n)) a b = a -&gt; a -&gt; b</tt>.</p>
<p>To define a generic vector class, we also have to associate vectors with their arity. This is easy:</p>
<pre>type family Dim (v :: * -&gt; *)</pre>
<p>One design decision here is that vectors are type constructors such as <tt>Complex</tt> which must be applied to an element type. This makes certain things easier but means that tuples like <tt>(Int,Int,Int)</tt> aren&#8217;t vectors. An alternative design in the style of the <a href="http://hackage.haskell.org/package/vector-space">vector-space</a> package is also possible and easily supported by the framework. But that&#8217;s a topic for another post.</p>
<p>Anyway, a generic vector class should be easy now:</p>
<pre>class Vector v a where
  construct :: Fn (Dim v) a (v a)
  inspect :: v a -&gt; Fn (Dim v) a b -&gt; b</pre>
<p>Alas, this fails quite miserably. The problem is that <tt>a</tt> and <tt>v</tt> are not fixed in <tt>construct</tt>. <tt>Fn</tt> is a non-injective type family and so there is no way to find outwhat <tt>v</tt> and <tt>a</tt> are from <tt>Fn (Dim v) a (v a)</tt>.</p>
<p>To fix that, it is enough to introduce an injective type constructor which captures <tt>Fn</tt>:</p>
<pre>newtype Fun n a b = Fun (Fn n a b)</pre>
<p>Now, <tt>Fn</tt> is a type function which captures functions from <tt>n</tt> <tt>a</tt>s to <tt>b</tt> and <tt>Fun</tt> is a type constructor which does the same but is injective.</p>
<p>As an aside, it is well possible to make <tt>Fun</tt> an injective data type family or even a GADT and forgo <tt>Fn</tt> altogether:</p>
<pre>data family Fun n a b
newtype instance Fun Z a b = FunZ b
newtype instance Fun (S n) a b = FunS (a -&gt; Fun n a b)</pre>
<p>However, this really messes up GHC&#8217;s optimiser and leads to very inefficient code. The slightly more complex setup with <tt>Fn</tt> and <tt>Fun</tt> is much better in this respect and not that hard to understand. Just keep in mind that <tt>Fn n a b</tt> doesn&#8217;t fix the types <tt>n</tt>, <tt>a</tt> and <tt>b</tt> and <tt>Fun</tt> does.</p>
<p>The final definition of <tt>Vector</tt> looks like this:</p>
<pre>class Arity (Dim n) =&gt; Vector v a where
  construct :: Fun (Dim v) a (v a)
  inspect :: v a -&gt; Fun (Dim v) a b -&gt; b</pre>
<p>Here is a simple instance:</p>
<pre>type instance Dim Complex = S (S Z)
instance RealFloat a =&gt; Vector Complex a where
  construct = Fun Complex
  inspect (Complex x y) (Fun f) = f x y</pre>
<p>This is nice but rather pointless unless we can do something useful with these vectors.  And we can! All necessary functionality is folded into the <tt>Arity</tt> type class which is a superclass of <tt>Vector</tt>. Here is its definition and instances for <tt>Z</tt> and <tt>S</tt>:</p>
<pre>class Arity n where
  accum :: (forall m. t (S m) -&gt; a -&gt; t m) -&gt; (t Z -&gt; b) -&gt; t n -&gt; Fn n a b
  apply :: (forall m. t (S m) -&gt; (a, t m)) -&gt; t n -&gt; Fn n a b -&gt; b

instance Arity Z where
  accum f g t = g t
  apply f t h = h

instance Arity n =&gt; Arity (S n) where
  accum f g t = \a -&gt; accum f g (f t a)
  apply f t h = case f t of (a,u) -&gt; apply f u (h a)</pre>
<p>This might look a bit scary but really isn&#8217;t. At least not very. The two methods allow us to define (<tt>accum</tt>) and apply n-ary functions generically. Let&#8217;s look at the latter first. In <tt>apply f t h</tt>, <tt>t</tt> is a seed which can generate <tt>n</tt> values of type <tt>a</tt>, where <tt>n</tt> is statically encoded in its type. From this seed, the function <tt>f</tt> produces one a value and a new seed which now embeds <tt>n-1</tt> values. It can do this for any <tt>n</tt> which means that we can apply it repeatedly to obtain all values stored in the seed. These values are then passed to <tt>h</tt>, which ultimately yields a result of type <tt>b</tt>. Here is an example of how to use <tt>apply</tt>:</p>
<pre>data T_replicate n = T_replicate

replicateF :: forall n a b. Arity n =&gt; a -&gt; Fun n a b -&gt; b
replicateF x (Fun h) = apply (\T_replicate -&gt; (x, T_replicate))
                             (T_replicate :: T_replicate n)
                             h</pre>
<p>This function applies an n-ary <tt>Fun</tt> to <tt>n</tt> copies of <tt>x</tt>. The seed <tt>T_replicate</tt> doesn&#8217;t carry any useful information, it just fixes <tt>n</tt>. When asked to produce a new a, we simply return <tt>x</tt>; the new seed is <tt>T_replicate</tt> with a different <tt>n</tt>.</p>
<p>Now, it is easy to define a generic replicate for arbitrary vectors:</p>
<pre>replicate :: Vector v a =&gt; a -&gt; v a
replicate x = replicateF x
            $ construct</pre>
<p>To understand how it all works, Let&#8217;s see how the term <tt>replicate 0 :: Complex Double</tt> is evaluated:</p>
<pre>  replicate 0 :: Complex Double
= replicateF x (construct :: Fun (S (S Z)) Double (Complex Double))
= let f :: forall m. T_replicate (S m) -&gt; (Double, T_replicate m)
      f T_replicate = (x, T_replicate)
  in
  apply f
        (T_replicate :: T_replicate (S (S Z)))
        Complex
= let f = ...
  in
  case f (T_replicate :: T_replicate (S (S Z))) of
    (x,u) -&gt; apply f u (Complex x)
= let f = ...
  in
  apply f (T_replicate :: T_replicate (S Z)) (Complex x)
= let f = ...
  in
  case f (T_replicate :: T_replicate (S Z)) of
    (x,u) -&gt; apply f u (Complex x)
= let f = ...
  in
  apply f (T_replicate :: Z) (Complex x x)
= Complex x x</pre>
<p>The interesting part is, of course, that replicate works for vectors of arbitrary arities. Equally importantly, everything is evaluated at compile time &#8211; GHC really compiles the term to <tt>(0 :+ 0)</tt>.</p>
<p>The other method of <tt>Arity</tt>, <tt>accum</tt>, is slightly more involved. The call <tt>accum f g t</tt> produces a function which accumulates its arguments (of type <tt>a</tt>) by repeatedly applying <tt>f</tt> to an accumulator value (starting with <tt>t</tt>) and an <tt>a</tt> argument. The number of values that can be accumulated into <tt>t</tt> is, again, determined by its type and decreases with each application of <tt>f</tt>. After accumulating all arguments in this way, we end up with a value of type <tt>t Z</tt> from which <tt>g</tt> produces the final result.</p>
<p>Here are two examples of how to use accumulate:</p>
<pre>newtype T_foldl b n = T_foldl b

foldlF :: forall n a b. Arity n =&gt; (b -&gt; a -&gt; b) -&gt; b -&gt; Fun n a b
foldlF f b = Fun $ accum (\(T_foldl b) a -&gt; T_foldl (f b a))
                         (\(T_foldl b) -&gt; b)
                         (T_foldl b :: T_foldl b n)

newtype T_map b c n = T_map (Fn n b c)

mapF :: forall n a b c. Arity n =&gt; (a -&gt; b) -&gt; Fun n b c -&gt; Fun n a c
mapF f (Fun h) = Fun $ accum (\(T_map h) a -&gt; T_map (h (f a)))
                             (\(T_map h) -&gt; h)
                             (T_map h :: T_map b c n)</pre>
<p>Their semantics shouldn&#8217;t be surprising:</p>
<pre>foldlF f z = \a1 ... an -&gt; z `f` a1 `f` ... `f` an
mapF f h   = \a1 ... an -&gt; h (f a1) ... (f an)</pre>
<p>There are, of course, more useful functions which can be defined in this way. Of these, the most important is probably <tt>zipWithF</tt>. I&#8217;ll leave its implementation as an exercise for the interested reader (should there be any) for now. It is a bit elaborate but not that difficult to come up with. Took me only a few hours or so.</p>
<p>Now that we have all these nice functions, it is quite easy to implement a lot of useful stuff for our vectors. Here are a few examples:</p>
<pre>map :: (Vector v a, Vector v b) =&gt; (a -&gt; b) -&gt; v a -&gt; v b
map f v = inspect v
        $ mapF f
        $ construct

zipWith :: (Vector v a, Vector v b, Vector v c)
        =&gt; (a -&gt; b -&gt; c) -&gt; v a -&gt; v b -&gt; v c
zipWith f v w = inspect w
              $ inspect v
              $ zipWithF f
              $ construct

fold :: Vector v a =&gt; (b -&gt; a -&gt; b) -&gt; b -&gt; v a -&gt; b
fold f z v = inspect v
           $ foldF f z

eq :: (Vector v a, Eq a) =&gt; v a -&gt; v a -&gt; Bool
eq v w = inspect w
       $ inspect v
       $ zipWithF (==)
       $ foldF (&amp;&amp;) True</pre>
<p>The structure of these functions reflects the fact that we are essentially programming in continuation-passing style.</p>
<p>All this is nicely generic but what about efficiency? With appropriate <tt>INLINE</tt> pragmas, GHC&#8217;s mighty simplifier is perfectly capable of eliminating all intermediate data and functions and produce code that is equivalent to what we would write by hand. For example, here is a small function:</p>
<pre>isZero :: Complex Double -&gt; Bool
isZero x = V.eq x (V.replicate 0)</pre>
<p>The code generated by GHC is perfect:</p>
<pre>\ (x_agj :: Complex Double) -&gt;
  case x_agj of _ { :+ x1_XxL y_XxO -&gt;
  case x1_XxL of _ { D# x2_awT -&gt;
  case ==## x2_awT 0.0 of _ {
    False -&gt; False;
    True -&gt; case y_XxO of _ { D# x3_Xy6 -&gt; ==## x3_Xy6 0.0 }
  } } }</pre>
<p>There is a fairly obvious connection between <tt>accum</tt> and <tt>apply</tt> and well-known list operations. Consider that <tt>Fun n a b</tt> is, in a sense, equivalent to a partial function of type <tt>[a] -&gt; b</tt>. The <tt>Arity</tt> methods then correspond to the following functions:</p>
<pre>
accumList :: (t -&gt; a -&gt; t) -&gt; (t -&gt; b) -&gt; [a] -&gt; b
applyList :: (t -&gt; (a,t)) -&gt; t -&gt; ([a] -&gt; b) -&gt; b
</pre>
<p>Of course, <tt>accumList</tt> is just a disguised <tt>foldl</tt> and <tt>applyList</tt> is almost exactly <tt>unfoldr</tt>. It would be nicer if <tt>accum</tt> corresponded to <tt>foldr</tt> instead of <tt>foldl</tt> but that would lead to much uglier code.</p>
<p>Anyway, this is it for now. I&#8217;m still cleaning up the library (which doesn&#8217;t have any comments whatsoever at the moment) but a first version should appear on Hackage in the next week or so.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/unlines.wordpress.com/355/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/unlines.wordpress.com/355/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/unlines.wordpress.com/355/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/unlines.wordpress.com/355/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/unlines.wordpress.com/355/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/unlines.wordpress.com/355/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/unlines.wordpress.com/355/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/unlines.wordpress.com/355/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/unlines.wordpress.com/355/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/unlines.wordpress.com/355/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/unlines.wordpress.com/355/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/unlines.wordpress.com/355/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/unlines.wordpress.com/355/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/unlines.wordpress.com/355/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unlines.wordpress.com&amp;blog=9686756&amp;post=355&amp;subd=unlines&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://unlines.wordpress.com/2010/11/15/generics-for-small-fixed-size-vectors/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">rl</media:title>
		</media:content>
	</item>
		<item>
		<title>Sparking imperatives</title>
		<link>http://unlines.wordpress.com/2010/04/21/sparking-imperatives/</link>
		<comments>http://unlines.wordpress.com/2010/04/21/sparking-imperatives/#comments</comments>
		<pubDate>Wed, 21 Apr 2010 07:21:44 +0000</pubDate>
		<dc:creator>Roman Leshchinskiy</dc:creator>
				<category><![CDATA[Haskell]]></category>

		<guid isPermaLink="false">http://unlines.wordpress.com/?p=304</guid>
		<description><![CDATA[Here is a fun hack I came up with while working on vector and repa. One way to express parallelism in Haskell is to write x `par` y (this comes from package parallel). This is equivalent to y but tells the compiler that it might be a good idea to start evaluating x in parallel [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unlines.wordpress.com&amp;blog=9686756&amp;post=304&amp;subd=unlines&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Here is a fun hack I came up with while working on <a href="http://hackage.haskell.org/package/vector">vector</a> and <a href="http://www.cse.unsw.edu.au/~chak/papers/KCLPL10.html">repa</a>. One way to express parallelism in Haskell is to write <tt>x `par` y</tt> (this comes from package <a href="http://hackage.haskell.org/package/parallel">parallel</a>). This is equivalent to <tt>y</tt> but tells the compiler that it might be a good idea to start evaluating <tt>x</tt> in parallel with <tt>y</tt>. <span id="more-304"></span>For example, this code says that the two expensive computations should be executed concurrently:</p>
<pre>let x = expensive computation 1
    y = expensive computation 2
in
(x `par` y) `pseq` (x+y)
</pre>
<p>The <tt>pseq</tt> is necessary because we have to tell the compiler to evaluate <tt>x `par` y</tt> before <tt>x+y</tt>. This is all explained in <a href="http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/strategies.html"><em>Algorithm  + Strategy = Parallelism</em></a>.</p>
<p>Internally, when <tt>x `par` y</tt> is evaluated the runtime system creates a <em>spark</em> for <tt>x</tt>. A spark is a bit like a thread but cheaper. The RTS maintains a queue of sparks and evaluates them whenever it has a spare CPU or core. The scheduling algorithm is based on work stealing and is really quite sophisticated. A detailed description of the RTS is given in <a href="http://www.haskell.org/~simonmar/papers/multicore-ghc.pdf"><em>Runtime Support for Multicore Haskell</em></a>.</p>
<p>Let&#8217;s try to abuse this mechanism a bit by sparking ST computations rather than pure ones. Here is how:</p>
<pre>parST :: ST s a -&gt; ST s a
parST m = x `par` return x
  where
    x = runST (unsafeIOToST noDuplicate &gt;&gt; unsafeCoerce m)
</pre>
<p>First, we create a thunk <tt>x</tt> which, when evaluated, runs the <tt>ST</tt> computation <tt>m</tt>. The parallel RTS will sometimes evaluate thunks twice which is fine for pure computations (it just duplicates work) but could be disastrous for stateful ones since it could duplicate side effects. The call to <tt>noDuplicate</tt> (from GHC.IO) ensures that this doesn&#8217;t happen. Then, we spark <tt>x</tt> and return it. Note that <tt>return</tt> is lazy and doesn&#8217;t evaluate <tt>x</tt>. This is absolutely crucial.</p>
<p>It is quite possible to implement <tt>parST</tt> in terms of <tt>forkIO</tt>. However, sparks are <em>much</em> cheaper than threads (yes, even than GHC threads) which means that this implementation ought to support much more fine-grained parallelism.</p>
<p>So how do we use <tt>parST</tt>? The basic idea is to spark computations, do something else for a while and then synchronise by demanding the results:</p>
<pre>do
  x &lt;- parST $ foo
  bar
  x `seq` return ()
</pre>
<p>The last line ensures that <tt>foo</tt> is executed one way or another: either in parallel to <tt>bar</tt> or after <tt>bar</tt> when its result is demanded by <tt>seq</tt>. We can capture an instance of this pattern in a combinator:</p>
<pre>(|||) :: ST s () -&gt; ST s () -&gt; ST s ()
p ||| q = do
            u &lt;- runST p
            q
            u `seq` return ()
</pre>
<p>This is quite straightforward to use. Here is a rather simple-minded version of in-place Quicksort:</p>
<pre>qsort :: (MVector v a, Ord a) =&gt; v s a -&gt; ST s ()
qsort v
  | n &lt; 2 = return ()
  | otherwise = do
                  x &lt;- unsafeRead v (n `div` 2)
                  i &lt;- unstablePartition (&lt;x) v
                  qsort (unsafeSlice 0 i v)
                    ||| qsort (unsafeSlice (max i 1) (n-i) v)
  where
    n = length v
</pre>
<p>This looks just like sequential Quicksort except that the two recursive calls are potentially executed in parallel.</p>
<p>Interestingly, the programming model that <tt>parST</tt> gives us is very well known (e.g., as lazy threads). In particular, <a href="http://supertech.csail.mit.edu/cilk">Cilk</a>, which I rather like, is based on a very similar approach. It is quite amazing that we can get this with basically 2 lines of Haskell code.</p>
<p>This leaves two questions. Firstly: is it safe? I think (but I&#8217;m not sure) that the answer is yes for IO (we can define <tt>parIO</tt> similarly to <tt>parST</tt>) but it is definitely not safe for ST. Here is an example that produces different results depending on scheduling and compiler optimisations:</p>
<pre>
let x = runST (do { r &lt;- newSTRef 0; writeSTRef r 1 ||| writeSTRef r 2; readSTRef r })
    y = runST (do { r &lt;- newSTRef 0; writeSTRef r 1 ||| writeSTRef r 2; readSTRef r })
in x == y
</pre>
<p>It ought to be possible to build a safe library on top of it, though.</p>
<p>So what about performance? Alas, I don&#8217;t have time to pursue this at the moment so I have absolutely no idea. The only real benchmark I tried is Introsort from Dan Doel&#8217;s <a href="http://hackage.haskell.org/package/vector-algorithms">vector-algorithms</a> which I parallelised like Quicksort above (that is, I changed exactly one line). The chart below shows the running times for sorting 10M random elements (in seconds) vs. the number of cores on an 8-core XServe. Clearly, it does things in parallel but it doesn&#8217;t really scale. I have my suspicions as to why this is the case (I blame <tt>noDuplicate</tt>) but I need to investigate more. It is quite encouraging, however, that the very naive parallel algorithm is barely slower than the sequential one on 1 core (1.98s vs. 2.08s).</p>
<p><img src="http://unlines.files.wordpress.com/2010/04/parst.png?w=600" alt="" /></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/unlines.wordpress.com/304/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/unlines.wordpress.com/304/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/unlines.wordpress.com/304/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/unlines.wordpress.com/304/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/unlines.wordpress.com/304/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/unlines.wordpress.com/304/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/unlines.wordpress.com/304/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/unlines.wordpress.com/304/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/unlines.wordpress.com/304/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/unlines.wordpress.com/304/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/unlines.wordpress.com/304/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/unlines.wordpress.com/304/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/unlines.wordpress.com/304/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/unlines.wordpress.com/304/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unlines.wordpress.com&amp;blog=9686756&amp;post=304&amp;subd=unlines&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://unlines.wordpress.com/2010/04/21/sparking-imperatives/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">rl</media:title>
		</media:content>

		<media:content url="http://unlines.files.wordpress.com/2010/04/parst.png" medium="image" />
	</item>
		<item>
		<title>vector 0.5 is here</title>
		<link>http://unlines.wordpress.com/2010/02/15/vector-0-5-is-here-2/</link>
		<comments>http://unlines.wordpress.com/2010/02/15/vector-0-5-is-here-2/#comments</comments>
		<pubDate>Mon, 15 Feb 2010 11:49:47 +0000</pubDate>
		<dc:creator>Roman Leshchinskiy</dc:creator>
				<category><![CDATA[Haskell]]></category>

		<guid isPermaLink="false">http://unlines.wordpress.com/?p=288</guid>
		<description><![CDATA[Version 0.5 of package vector is finally on Hackage! I was going to release it about a month ago but just never found the time. It has a lot more functionality than previous versions, is much more usable and significantly faster. Here are the main highlights: DPH-style unboxed vectors (in Data.Vector.Unboxed) which use associated types [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unlines.wordpress.com&amp;blog=9686756&amp;post=288&amp;subd=unlines&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://hackage.haskell.org/package/vector">Version 0.5 of package vector</a> is finally on Hackage! I was going to release it about a month ago but just never found the time. It has a lot more functionality than previous versions, is much more usable and significantly faster. Here are the main highlights:<span id="more-288"></span></p>
<ul>
<li>DPH-style unboxed vectors (in <tt>Data.Vector.Unboxed</tt>) which use associated types to select the appropriate unboxed representation depending on the type of the elements.</li>
<li>Redesigned interface between mutable and immutable vectors. In particular, the popular <tt>unsafeFreeze</tt> primitive is now supported for all vector types.</li>
<li>Many new operations on both immutable and mutable vectors.</li>
<li>Significant performance improvements.</li>
</ul>
<p>The release is accompanied by a new version of the <a href="http://hackage.haskell.org/package/NoSlow">NoSlow</a> array benchmark suite which has been cleaned up and significantly extended. In particular, it now includes a couple of small array algorithms which probably provide a much better indication of overall performance than just loop kernels. Alas, I didn&#8217;t have enough time to do a lot of benchmarking but here is a graph which shows how much faster <a href="http://hackage.haskell.org/package/NoSlow">NoSlow</a> algorithms run when using <a href="http://hackage.haskell.org/package/vector">vector</a> compared to <tt>dph-prim-seq</tt> with the current development version of GHC. Note that this shows both safe (bounds checking) and unsafe (no bounds checks) versions of the algorithms. In general, bounds checking doesn&#8217;t seem to cost much because we mostly use collective operations which don&#8217;t require any checks. However, it is rather weird that for <tt>list_rank</tt> and <tt>hyb_cc</tt>, bounds checking actually makes things faster. This probably points to a problem somewhere in the simplifier.</p>
<p><img src="http://unlines.files.wordpress.com/2010/02/unboxed-vs-dph2-e1266234166166.png?w=600" alt="Data.Vector.Unboxed vs. dph-prim-seq" /></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/unlines.wordpress.com/288/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/unlines.wordpress.com/288/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/unlines.wordpress.com/288/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/unlines.wordpress.com/288/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/unlines.wordpress.com/288/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/unlines.wordpress.com/288/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/unlines.wordpress.com/288/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/unlines.wordpress.com/288/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/unlines.wordpress.com/288/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/unlines.wordpress.com/288/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/unlines.wordpress.com/288/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/unlines.wordpress.com/288/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/unlines.wordpress.com/288/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/unlines.wordpress.com/288/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unlines.wordpress.com&amp;blog=9686756&amp;post=288&amp;subd=unlines&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://unlines.wordpress.com/2010/02/15/vector-0-5-is-here-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">rl</media:title>
		</media:content>

		<media:content url="http://unlines.files.wordpress.com/2010/02/unboxed-vs-dph2-e1266234166166.png" medium="image">
			<media:title type="html">Data.Vector.Unboxed vs. dph-prim-seq</media:title>
		</media:content>
	</item>
		<item>
		<title>NoSlow: Microbenchmarks for Haskell array libraries</title>
		<link>http://unlines.wordpress.com/2009/11/27/noslow/</link>
		<comments>http://unlines.wordpress.com/2009/11/27/noslow/#comments</comments>
		<pubDate>Thu, 26 Nov 2009 15:15:20 +0000</pubDate>
		<dc:creator>Roman Leshchinskiy</dc:creator>
				<category><![CDATA[Haskell]]></category>

		<guid isPermaLink="false">http://unlines.wordpress.com/?p=204</guid>
		<description><![CDATA[Over the last couple of days, I have implemented a small benchmark suite which tries to measure the performance of various Haskell array libraries, with particular emphasis on finding out how well they are able to fuse things. It is now on Hackage under the very creative and imaginative name NoSlow (Haskell seems to have [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unlines.wordpress.com&amp;blog=9686756&amp;post=204&amp;subd=unlines&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Over the last couple of days, I have implemented a small benchmark suite which tries to measure the performance of various Haskell array libraries, with particular emphasis on finding out how well they are able to fuse things. It is now on <a href="http://hackage.haskell.org/package/NoSlow">Hackage</a> under the very creative and imaginative name <a href="http://hackage.haskell.org/package/NoSlow">NoSlow</a> (Haskell seems to have gained a tradition of naming benchmark suites no<em>something</em>). What it does is compile and run a set of micro-benchmarks using these libraries:<span id="more-204"></span></p>
<ul>
<li>standard lists</li>
<li>primitive arrays from the <a href="http://www.haskell.org/haskellwiki/GHC/Data_Parallel_Haskell">DPH project</a> (package <tt>dph-prim-seq</tt>)</li>
<li><a href="http://hackage.haskell.org/package/uvector">uvector</a> (a fork of the above)</li>
<li><a href="http://hackage.haskell.org/package/vector">vector</a> (primitive, storable and boxed arrays)</li>
<li><a href="http://hackage.haskell.org/package/storablevector">storablevector</a></li>
</ul>
<p>The actual benchmarking is done by Brian O&#8217;Sullivan&#8217;s wonderful <a href="http://hackage.haskell.org/package/criterion">criterion</a>.</p>
<p><a href="http://hackage.haskell.org/package/NoSlow">NoSlow</a> is a very young project so the output isn&#8217;t particularly pretty but it does the job. It is also not very thoroughly tested and only includes a very restricted set of benchmarks so saying &#8220;library <em>x</em> is better than library <em>y</em> because it produces better numbers in <a href="http://hackage.haskell.org/package/NoSlow">NoSlow</a>&#8221; would be completely and utterly wrong. However, it is quite useful for identifying cases where taking a closer look at the code generated for a particular loop might be a good idea. It already helped me spot one <a href="http://www.haskell.org/pipermail/cvs-libraries/2009-November/011522.html">tricky performance regression</a> in the standard list library.</p>
<p>Here is an example of the data it produces.</p>
<table border="1" style='font-size:80%;white-space:nowrap;'>
<caption>NoSlow results for GHC 6.13.20091124 (only loops specialised for Double)</caption>
<tr>
<th></th>
<th colspan="1">dph-prim</th>
<th colspan="1">list</th>
<th colspan="1">storablevector</th>
<th colspan="1">uvector</th>
<th colspan="3">vector</th>
</tr>
<tr>
<th></th>
<th colspan="1">seq</th>
<th colspan="1"></th>
<th colspan="1"></th>
<th colspan="1"></th>
<th colspan="1">Primitive</th>
<th colspan="1">Storable</th>
<th colspan="1">boxed</th>
</tr>
<tr>
<th>$a+1</th>
<td>58.38504 us</td>
<td>192.9597 us</td>
<td>58.25747 us</td>
<td>58.47876 us</td>
<td>58.52366 us</td>
<td>58.77233 us</td>
<td>405.5711 us</td>
</tr>
<tr>
<th>$a+^1</th>
<td>59.03597 us</td>
<td>356.6624 us</td>
<td>210.4899 us</td>
<td>80.25804 us</td>
<td>58.53107 us</td>
<td>59.19546 us</td>
<td>417.2587 us</td>
</tr>
<tr>
<th>$a+x</th>
<td>55.62417 us</td>
<td>236.2620 us</td>
<td>55.63200 us</td>
<td>55.62767 us</td>
<td>55.33446 us</td>
<td>55.96459 us</td>
<td>477.2455 us</td>
</tr>
<tr>
<th>$a+^x</th>
<td>56.91372 us</td>
<td>445.1638 us</td>
<td>237.9741 us</td>
<td>75.81757 us</td>
<td>55.79731 us</td>
<td>56.45521 us</td>
<td>491.3216 us</td>
</tr>
<tr>
<th>$a+x+x+x+x</th>
<td>96.75086 us</td>
<td>280.3636 us</td>
<td>236.0562 us</td>
<td>96.76510 us</td>
<td>95.07297 us</td>
<td>97.15892 us</td>
<td>519.6680 us</td>
</tr>
<tr>
<th>$a+$b</th>
<td>64.82325 us</td>
<td>255.3842 us</td>
<td>91.09551 us</td>
<td>64.82592 us</td>
<td>65.21805 us</td>
<td>65.10213 us</td>
<td>500.7479 us</td>
</tr>
<tr>
<th>$a+$b(zip)</th>
<td>64.48503 us</td>
<td>347.4379 us</td>
<td></td>
<td>64.49701 us</td>
<td></td>
<td></td>
<td>499.5277 us</td>
</tr>
<tr>
<th>x*$a+$b</th>
<td>75.63250 us</td>
<td>452.0626 us</td>
<td>154.3398 us</td>
<td>75.68617 us</td>
<td>110.5486 us</td>
<td>155.2830 us</td>
<td>649.9281 us</td>
</tr>
<tr>
<th>($a+$b)*($c+$d)</th>
<td>125.0923 us</td>
<td>1.080122 ms</td>
<td>288.3781 us</td>
<td>125.0150 us</td>
<td>919.2376 us</td>
<td>1.132326 ms</td>
<td>2.967417 ms</td>
</tr>
<tr>
<th>($a+$b)*($c+$d)(zip)</th>
<td>155.3547 us</td>
<td>742.2439 us</td>
<td></td>
<td>155.3554 us</td>
<td></td>
<td></td>
<td>2.509351 ms</td>
</tr>
<tr>
<th>(x+$a)*(y+$b)</th>
<td>104.0011 us</td>
<td>678.5383 us</td>
<td>203.9229 us</td>
<td>103.9131 us</td>
<td>89.32382 us</td>
<td>89.98352 us</td>
<td>401.6301 us</td>
</tr>
<tr>
<th>(^x+$a)*(^y+$b)</th>
<td>118.5413 us</td>
<td>1.061493 ms</td>
<td>534.0542 us</td>
<td>150.2891 us</td>
<td>822.7933 us</td>
<td>930.1098 us</td>
<td>2.650773 ms</td>
</tr>
<tr>
<th>filter(neq0)(map0)</th>
<td>61.45542 us</td>
<td>78.80583 us</td>
<td>104.6501 us</td>
<td>61.35867 us</td>
<td>68.04449 us</td>
<td>61.67384 us</td>
<td>124.5576 us</td>
</tr>
<tr>
<th>filter(neq0)(^0)</th>
<td>17.42000 us</td>
<td>23.41616 us</td>
<td>173.0071 us</td>
<td>17.91704 us</td>
<td>23.33224 us</td>
<td>24.97328 us</td>
<td>59.48926 us</td>
</tr>
<tr>
<th>filter(eq0)(map0)</th>
<td>69.61809 us</td>
<td>174.2209 us</td>
<td>242.6149 us</td>
<td>69.63563 us</td>
<td>69.34671 us</td>
<td>69.89652 us</td>
<td>331.1557 us</td>
</tr>
<tr>
<th>filter(eq0)(^0)</th>
<td>45.96399 us</td>
<td>94.22360 us</td>
<td>308.5451 us</td>
<td>45.75672 us</td>
<td>52.15233 us</td>
<td>45.52413 us</td>
<td>201.7559 us</td>
</tr>
<tr>
<th>zip_filter</th>
<td>76.31172 us</td>
<td>466.6432 us</td>
<td>334.0208 us</td>
<td>76.24536 us</td>
<td>99.80086 us</td>
<td>100.4251 us</td>
<td>341.6012 us</td>
</tr>
<tr>
<th>filter_zip</th>
<td>81.01137 us</td>
<td>468.8768 us</td>
<td>239.6135 us</td>
<td>79.83034 us</td>
<td>80.92583 us</td>
<td>87.66268 us</td>
<td>255.4129 us</td>
</tr>
<tr>
<th>filter_evens</th>
<td>59.13895 us</td>
<td>182.2491 us</td>
<td></td>
<td>58.59793 us</td>
<td></td>
<td></td>
<td>337.2523 us</td>
</tr>
<tr>
<th>sum($a*$b)</th>
<td>115.0304 us</td>
<td>1.130831 ms</td>
<td>145.4783 us</td>
<td>115.0499 us</td>
<td>73.86368 us</td>
<td>74.39775 us</td>
<td>99.39568 us</td>
</tr>
<tr>
<th>sum($a*$b)(zip)</th>
<td>114.9057 us</td>
<td>1.420020 ms</td>
<td></td>
<td>114.8766 us</td>
<td></td>
<td></td>
<td>99.43429 us</td>
</tr>
<tr>
<th>sum[m..n]</th>
<td>53.50359 us</td>
<td>646.0969 us</td>
<td>235.4800 us</td>
<td>53.76784 us</td>
<td>129.2202 us</td>
<td>133.0979 us</td>
<td>134.6730 us</td>
</tr>
<tr>
<th>sumsq(map)</th>
<td>67.20919 us</td>
<td>653.9266 us</td>
<td>292.8844 us</td>
<td>67.21005 us</td>
<td>143.1072 us</td>
<td>142.2476 us</td>
<td>143.5639 us</td>
</tr>
<tr>
<th>sumsq(zip)</th>
<td>67.24133 us</td>
<td>1.928390 ms</td>
<td>311.0818 us</td>
<td>160.3321 us</td>
<td>369.7680 us</td>
<td>380.4847 us</td>
<td>914.5373 us</td>
</tr>
<tr>
<th>sum_evens</th>
<td>120.4421 us</td>
<td>401.7939 us</td>
<td></td>
<td>120.4255 us</td>
<td></td>
<td></td>
<td>150.9723 us</td>
</tr>
</table>
<p><h1>Benchmarks</h1>
</p>
<p>At the moment, <a href="http://hackage.haskell.org/package/NoSlow">NoSlow</a> only benchmarks a bunch of very small and fairly random loop kernels as I was more concerned with getting the infrastructure in place. Here is an example  (called <i>$a+$b</i> in the table):</p>
<pre>dotp as bs = sum (zipWith (*) as bs)
</pre>
<p>When it is built, <a href="http://hackage.haskell.org/package/NoSlow">NoSlow</a> compiles several different versions of this loop. For instance, for the <a href="http://hackage.haskell.org/package/storablevector">storablevector</a> backend it will generate these two functions:</p>
<pre>
dotp :: (Storable a, Num a) =&gt; StorableVector a -&gt; StorableVector a -&gt; a
dotp :: StorableVector Double -&gt; StorableVector Double -&gt; Double
</pre>
<p>The first one is overloaded on the element type whereas the second one is specialised to <tt>Double</tt>. <a href="http://hackage.haskell.org/package/NoSlow">NoSlow</a> does this for all its backends and then runs all generated versions of the benchmark with the same test data. With more testing and more benchmarks, this will give us a rough estimate of the relative performance of the different array libraries. It also shows how much faster specialised loops are compared to their polymorphic versions</p>
<p>Here is the output of a full run. <i>*Double</i> and <i>Double</i> are overloaded and specialised loops, respectively. At the moment, <a href="http://hackage.haskell.org/package/NoSlow">NoSlow</a> only does Double benchmarks but adding more types is easy.</p>
<table border="1" style='font-size:80%;white-space:nowrap;'>
<caption>NoSlow results for GHC 6.13.20091124</caption>
<tr>
<th></th>
<th colspan="2">dph-prim</th>
<th colspan="2">list</th>
<th colspan="2">storablevector</th>
<th colspan="2">uvector</th>
<th colspan="6">vector</th>
</tr>
<tr>
<th></th>
<th colspan="2">seq</th>
<th colspan="2"></th>
<th colspan="2"></th>
<th colspan="2"></th>
<th colspan="2">Primitive</th>
<th colspan="2">Storable</th>
<th colspan="2">boxed</th>
</tr>
<tr>
<th></th>
<th>*Double</th>
<th>Double</th>
<th>*Double</th>
<th>Double</th>
<th>*Double</th>
<th>Double</th>
<th>*Double</th>
<th>Double</th>
<th>*Double</th>
<th>Double</th>
<th>*Double</th>
<th>Double</th>
<th>*Double</th>
<th>Double</th>
</tr>
<tr>
<th>$a+1</th>
<td>388.9189 us</td>
<td>58.38504 us</td>
<td>334.0156 us</td>
<td>192.9597 us</td>
<td>310.7921 us</td>
<td>58.25747 us</td>
<td>410.9401 us</td>
<td>58.47876 us</td>
<td>831.8845 us</td>
<td>58.52366 us</td>
<td>467.6662 us</td>
<td>58.77233 us</td>
<td>563.7031 us</td>
<td>405.5711 us</td>
</tr>
<tr>
<th>$a+^1</th>
<td>412.6220 us</td>
<td>59.03597 us</td>
<td>361.8679 us</td>
<td>356.6624 us</td>
<td>808.4398 us</td>
<td>210.4899 us</td>
<td>554.7731 us</td>
<td>80.25804 us</td>
<td>842.5514 us</td>
<td>58.53107 us</td>
<td>469.5773 us</td>
<td>59.19546 us</td>
<td>570.0256 us</td>
<td>417.2587 us</td>
</tr>
<tr>
<th>$a+x</th>
<td>398.3495 us</td>
<td>55.62417 us</td>
<td>266.9966 us</td>
<td>236.2620 us</td>
<td>317.0817 us</td>
<td>55.63200 us</td>
<td>416.8037 us</td>
<td>55.62767 us</td>
<td>848.4946 us</td>
<td>55.33446 us</td>
<td>479.7607 us</td>
<td>55.96459 us</td>
<td>576.0136 us</td>
<td>477.2455 us</td>
</tr>
<tr>
<th>$a+^x</th>
<td>401.3565 us</td>
<td>56.91372 us</td>
<td>376.1232 us</td>
<td>445.1638 us</td>
<td>821.9274 us</td>
<td>237.9741 us</td>
<td>560.3928 us</td>
<td>75.81757 us</td>
<td>854.3754 us</td>
<td>55.79731 us</td>
<td>483.5405 us</td>
<td>56.45521 us</td>
<td>579.5658 us</td>
<td>491.3216 us</td>
</tr>
<tr>
<th>$a+x+x+x+x</th>
<td>1.099485 ms</td>
<td>96.75086 us</td>
<td>989.8561 us</td>
<td>280.3636 us</td>
<td>1.367973 ms</td>
<td>236.0562 us</td>
<td>1.108161 ms</td>
<td>96.76510 us</td>
<td>1.732897 ms</td>
<td>95.07297 us</td>
<td>1.402448 ms</td>
<td>97.15892 us</td>
<td>1.918533 ms</td>
<td>519.6680 us</td>
</tr>
<tr>
<th>$a+$b</th>
<td>675.8037 us</td>
<td>64.82325 us</td>
<td>266.1987 us</td>
<td>255.3842 us</td>
<td>695.4756 us</td>
<td>91.09551 us</td>
<td>721.3087 us</td>
<td>64.82592 us</td>
<td>1.238228 ms</td>
<td>65.21805 us</td>
<td>758.5373 us</td>
<td>65.10213 us</td>
<td>579.0221 us</td>
<td>500.7479 us</td>
</tr>
<tr>
<th>$a+$b(zip)</th>
<td>674.6217 us</td>
<td>64.48503 us</td>
<td>486.2640 us</td>
<td>347.4379 us</td>
<td></td>
<td></td>
<td>718.1258 us</td>
<td>64.49701 us</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>579.6009 us</td>
<td>499.5277 us</td>
</tr>
<tr>
<th>x*$a+$b</th>
<td>843.1906 us</td>
<td>75.63250 us</td>
<td>576.2166 us</td>
<td>452.0626 us</td>
<td>1.086456 ms</td>
<td>154.3398 us</td>
<td>889.1795 us</td>
<td>75.68617 us</td>
<td>1.494543 ms</td>
<td>110.5486 us</td>
<td>877.4670 us</td>
<td>155.2830 us</td>
<td>1.226473 ms</td>
<td>649.9281 us</td>
</tr>
<tr>
<th>($a+$b)*($c+$d)</th>
<td>1.510817 ms</td>
<td>125.0923 us</td>
<td>1.003353 ms</td>
<td>1.080122 ms</td>
<td>1.958700 ms</td>
<td>288.3781 us</td>
<td>1.582407 ms</td>
<td>125.0150 us</td>
<td>3.109755 ms</td>
<td>919.2376 us</td>
<td>1.915603 ms</td>
<td>1.132326 ms</td>
<td>3.199406 ms</td>
<td>2.967417 ms</td>
</tr>
<tr>
<th>($a+$b)*($c+$d)(zip)</th>
<td>1.585162 ms</td>
<td>155.3547 us</td>
<td>1.345901 ms</td>
<td>742.2439 us</td>
<td></td>
<td></td>
<td>1.845875 ms</td>
<td>155.3554 us</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>2.749534 ms</td>
<td>2.509351 ms</td>
</tr>
<tr>
<th>(x+$a)*(y+$b)</th>
<td>1.139386 ms</td>
<td>104.0011 us</td>
<td>969.7649 us</td>
<td>678.5383 us</td>
<td>1.451681 ms</td>
<td>203.9229 us</td>
<td>1.174562 ms</td>
<td>103.9131 us</td>
<td>1.950864 ms</td>
<td>89.32382 us</td>
<td>1.404879 ms</td>
<td>89.98352 us</td>
<td>2.030718 ms</td>
<td>401.6301 us</td>
</tr>
<tr>
<th>(^x+$a)*(^y+$b)</th>
<td>1.095473 ms</td>
<td>118.5413 us</td>
<td>1.154678 ms</td>
<td>1.061493 ms</td>
<td>2.512015 ms</td>
<td>534.0542 us</td>
<td>1.530284 ms</td>
<td>150.2891 us</td>
<td>2.452523 ms</td>
<td>822.7933 us</td>
<td>1.825248 ms</td>
<td>930.1098 us</td>
<td>3.399711 ms</td>
<td>2.650773 ms</td>
</tr>
<tr>
<th>filter(neq0)(map0)</th>
<td>357.0597 us</td>
<td>61.45542 us</td>
<td>320.2454 us</td>
<td>78.80583 us</td>
<td>581.4489 us</td>
<td>104.6501 us</td>
<td>364.2109 us</td>
<td>61.35867 us</td>
<td>609.2637 us</td>
<td>68.04449 us</td>
<td>531.9321 us</td>
<td>61.67384 us</td>
<td>370.0963 us</td>
<td>124.5576 us</td>
</tr>
<tr>
<th>filter(neq0)(^0)</th>
<td>19.80998 us</td>
<td>17.42000 us</td>
<td>6.715188 us</td>
<td>23.41616 us</td>
<td>404.8197 us</td>
<td>173.0071 us</td>
<td>20.19333 us</td>
<td>17.91704 us</td>
<td>24.38774 us</td>
<td>23.33224 us</td>
<td>24.77845 us</td>
<td>24.97328 us</td>
<td>59.51524 us</td>
<td>59.48926 us</td>
</tr>
<tr>
<th>filter(eq0)(map0)</th>
<td>527.7883 us</td>
<td>69.61809 us</td>
<td>499.8775 us</td>
<td>174.2209 us</td>
<td>839.2031 us</td>
<td>242.6149 us</td>
<td>542.4786 us</td>
<td>69.63563 us</td>
<td>1.015737 ms</td>
<td>69.34671 us</td>
<td>622.5425 us</td>
<td>69.89652 us</td>
<td>619.5380 us</td>
<td>331.1557 us</td>
</tr>
<tr>
<th>filter(eq0)(^0)</th>
<td>153.7917 us</td>
<td>45.96399 us</td>
<td>87.08318 us</td>
<td>94.22360 us</td>
<td>648.8350 us</td>
<td>308.5451 us</td>
<td>162.6549 us</td>
<td>45.75672 us</td>
<td>417.1879 us</td>
<td>52.15233 us</td>
<td>126.1947 us</td>
<td>45.52413 us</td>
<td>214.1162 us</td>
<td>201.7559 us</td>
</tr>
<tr>
<th>zip_filter</th>
<td>696.4477 us</td>
<td>76.31172 us</td>
<td>631.0826 us</td>
<td>466.6432 us</td>
<td>1.158844 ms</td>
<td>334.0208 us</td>
<td>735.2156 us</td>
<td>76.24536 us</td>
<td>1.042058 ms</td>
<td>99.80086 us</td>
<td>635.7235 us</td>
<td>100.4251 us</td>
<td>588.1413 us</td>
<td>341.6012 us</td>
</tr>
<tr>
<th>filter_zip</th>
<td>741.1476 us</td>
<td>81.01137 us</td>
<td>467.9363 us</td>
<td>468.8768 us</td>
<td>1.073220 ms</td>
<td>239.6135 us</td>
<td>789.2609 us</td>
<td>79.83034 us</td>
<td>1.249209 ms</td>
<td>80.92583 us</td>
<td>835.3062 us</td>
<td>87.66268 us</td>
<td>507.6646 us</td>
<td>255.4129 us</td>
</tr>
<tr>
<th>filter_evens</th>
<td>315.2792 us</td>
<td>59.13895 us</td>
<td>185.6894 us</td>
<td>182.2491 us</td>
<td></td>
<td></td>
<td>342.7927 us</td>
<td>58.59793 us</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>321.4253 us</td>
<td>337.2523 us</td>
</tr>
<tr>
<th>sum($a*$b)</th>
<td>889.3029 us</td>
<td>115.0304 us</td>
<td>1.113731 ms</td>
<td>1.130831 ms</td>
<td>837.7596 us</td>
<td>145.4783 us</td>
<td>630.7635 us</td>
<td>115.0499 us</td>
<td>1.046405 ms</td>
<td>73.86368 us</td>
<td>956.1445 us</td>
<td>74.39775 us</td>
<td>310.2107 us</td>
<td>99.39568 us</td>
</tr>
<tr>
<th>sum($a*$b)(zip)</th>
<td>893.2471 us</td>
<td>114.9057 us</td>
<td>1.890718 ms</td>
<td>1.420020 ms</td>
<td></td>
<td></td>
<td>624.3066 us</td>
<td>114.8766 us</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>371.0969 us</td>
<td>99.43429 us</td>
</tr>
<tr>
<th>sum[m..n]</th>
<td>480.6165 us</td>
<td>53.50359 us</td>
<td>1.006370 ms</td>
<td>646.0969 us</td>
<td>599.5453 us</td>
<td>235.4800 us</td>
<td>257.9740 us</td>
<td>53.76784 us</td>
<td>520.8189 us</td>
<td>129.2202 us</td>
<td>455.0702 us</td>
<td>133.0979 us</td>
<td>450.9119 us</td>
<td>134.6730 us</td>
</tr>
<tr>
<th>sumsq(map)</th>
<td>601.0236 us</td>
<td>67.20919 us</td>
<td>1.389623 ms</td>
<td>653.9266 us</td>
<td>916.6944 us</td>
<td>292.8844 us</td>
<td>361.3013 us</td>
<td>67.21005 us</td>
<td>633.0462 us</td>
<td>143.1072 us</td>
<td>632.7607 us</td>
<td>142.2476 us</td>
<td>626.6808 us</td>
<td>143.5639 us</td>
</tr>
<tr>
<th>sumsq(zip)</th>
<td>581.3862 us</td>
<td>67.24133 us</td>
<td>2.487876 ms</td>
<td>1.928390 ms</td>
<td>1.391611 ms</td>
<td>311.0818 us</td>
<td>931.6956 us</td>
<td>160.3321 us</td>
<td>2.725828 ms</td>
<td>369.7680 us</td>
<td>1.892799 ms</td>
<td>380.4847 us</td>
<td>1.770848 ms</td>
<td>914.5373 us</td>
</tr>
<tr>
<th>sum_evens</th>
<td>522.6244 us</td>
<td>120.4421 us</td>
<td>377.5469 us</td>
<td>401.7939 us</td>
<td></td>
<td></td>
<td>283.7662 us</td>
<td>120.4255 us</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>293.9641 us</td>
<td>150.9723 us</td>
</tr>
</table>
<p>
<a href="http://hackage.haskell.org/package/NoSlow">NoSlow</a> can also be used to see how well different GHC versions optimise array loops. Here is a comparison of the current HEAD to GHC 6.10.4 (2 means 6.13 is 2x slower, 0.5 means it is 2x faster).</p>
<table border="1" style='font-size:80%;white-space:nowrap;'>
<caption>NoSlow results for GHC 6.13.20091124 vs GHC 6.10.4</caption>
<tr>
<th></th>
<th colspan="2">dph-prim</th>
<th colspan="2">list</th>
<th colspan="2">storablevector</th>
<th colspan="2">uvector</th>
<th colspan="6">vector</th>
</tr>
<tr>
<th></th>
<th colspan="2">seq</th>
<th colspan="2"></th>
<th colspan="2"></th>
<th colspan="2"></th>
<th colspan="2">Primitive</th>
<th colspan="2">Storable</th>
<th colspan="2">boxed</th>
</tr>
<tr>
<th></th>
<th>*Double</th>
<th>Double</th>
<th>*Double</th>
<th>Double</th>
<th>*Double</th>
<th>Double</th>
<th>*Double</th>
<th>Double</th>
<th>*Double</th>
<th>Double</th>
<th>*Double</th>
<th>Double</th>
<th>*Double</th>
<th>Double</th>
</tr>
<tr>
<th>$a+1</th>
<td>0.169</td>
<td>0.998</td>
<td>1.821</td>
<td>1.023</td>
<td>0.879</td>
<td>1.000</td>
<td>0.183</td>
<td>1.000</td>
<td>0.109</td>
<td>0.019</td>
<td>0.069</td>
<td>1.000</td>
<td>0.994</td>
<td>0.985</td>
</tr>
<tr>
<th>$a+^1</th>
<td>0.178</td>
<td>1.001</td>
<td>1.027</td>
<td>1.029</td>
<td>0.068</td>
<td>0.114</td>
<td>0.130</td>
<td>1.001</td>
<td>0.058</td>
<td>0.014</td>
<td>0.031</td>
<td>0.999</td>
<td>1.011</td>
<td>1.004</td>
</tr>
<tr>
<th>$a+x</th>
<td>0.172</td>
<td>1.004</td>
<td>1.138</td>
<td>0.989</td>
<td>0.970</td>
<td>1.006</td>
<td>0.185</td>
<td>1.005</td>
<td>0.121</td>
<td>0.018</td>
<td>0.072</td>
<td>1.002</td>
<td>1.024</td>
<td>0.965</td>
</tr>
<tr>
<th>$a+^x</th>
<td>0.173</td>
<td>0.982</td>
<td>1.034</td>
<td>1.023</td>
<td>0.068</td>
<td>0.127</td>
<td>0.132</td>
<td>0.992</td>
<td>0.060</td>
<td>0.013</td>
<td>0.032</td>
<td>1.007</td>
<td>1.012</td>
<td>0.979</td>
</tr>
<tr>
<th>$a+x+x+x+x</th>
<td>0.400</td>
<td>1.019</td>
<td>1.082</td>
<td>1.007</td>
<td>1.033</td>
<td>0.274</td>
<td>0.386</td>
<td>1.019</td>
<td>0.059</td>
<td>0.008</td>
<td>0.049</td>
<td>0.089</td>
<td>1.034</td>
<td>0.981</td>
</tr>
<tr>
<th>$a+$b</th>
<td>0.214</td>
<td>0.966</td>
<td>1.014</td>
<td>1.010</td>
<td>0.067</td>
<td>1.226</td>
<td>0.223</td>
<td>0.966</td>
<td>0.121</td>
<td>0.014</td>
<td>0.072</td>
<td>1.000</td>
<td>0.974</td>
<td>0.984</td>
</tr>
<tr>
<th>$a+$b(zip)</th>
<td>0.213</td>
<td>0.965</td>
<td>1.740</td>
<td>1.358</td>
<td></td>
<td></td>
<td>0.223</td>
<td>0.963</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>0.984</td>
<td>1.002</td>
</tr>
<tr>
<th>x*$a+$b</th>
<td>0.251</td>
<td>0.997</td>
<td>1.000</td>
<td>0.982</td>
<td>0.101</td>
<td>1.065</td>
<td>0.263</td>
<td>0.997</td>
<td>0.085</td>
<td>0.020</td>
<td>0.051</td>
<td>1.292</td>
<td>1.007</td>
<td>1.034</td>
</tr>
<tr>
<th>($a+$b)*($c+$d)</th>
<td>0.287</td>
<td>0.199</td>
<td>1.168</td>
<td>1.483</td>
<td>0.064</td>
<td>0.271</td>
<td>0.293</td>
<td>0.199</td>
<td>0.101</td>
<td>0.061</td>
<td>0.060</td>
<td>1.058</td>
<td>0.992</td>
<td>1.299</td>
</tr>
<tr>
<th>($a+$b)*($c+$d)(zip)</th>
<td>0.302</td>
<td>1.262</td>
<td>1.301</td>
<td>0.906</td>
<td></td>
<td></td>
<td>0.350</td>
<td>1.257</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>0.949</td>
<td>1.249</td>
</tr>
<tr>
<th>(x+$a)*(y+$b)</th>
<td>0.298</td>
<td>1.119</td>
<td>1.019</td>
<td>0.929</td>
<td>0.138</td>
<td>0.372</td>
<td>0.270</td>
<td>1.119</td>
<td>0.080</td>
<td>0.008</td>
<td>0.059</td>
<td>0.136</td>
<td>0.991</td>
<td>1.001</td>
</tr>
<tr>
<th>(^x+$a)*(^y+$b)</th>
<td>0.323</td>
<td>0.166</td>
<td>0.911</td>
<td>1.057</td>
<td>0.074</td>
<td>0.114</td>
<td>0.207</td>
<td>0.199</td>
<td>0.061</td>
<td>0.039</td>
<td>0.043</td>
<td>0.799</td>
<td>1.123</td>
<td>1.200</td>
</tr>
<tr>
<th>filter(neq0)(map0)</th>
<td>0.374</td>
<td>0.999</td>
<td>1.275</td>
<td>0.990</td>
<td>0.828</td>
<td>0.519</td>
<td>0.364</td>
<td>1.000</td>
<td>0.067</td>
<td>0.047</td>
<td>0.064</td>
<td>0.999</td>
<td>0.986</td>
<td>0.982</td>
</tr>
<tr>
<th>filter(neq0)(^0)</th>
<td>0.191</td>
<td>0.949</td>
<td>0.062</td>
<td>1.000</td>
<td>0.189</td>
<td>0.090</td>
<td>0.180</td>
<td>1.016</td>
<td>0.004</td>
<td>0.036</td>
<td>0.004</td>
<td>1.014</td>
<td>0.423</td>
<td>1.029</td>
</tr>
<tr>
<th>filter(eq0)(map0)</th>
<td>0.220</td>
<td>1.000</td>
<td>1.163</td>
<td>1.049</td>
<td>0.334</td>
<td>0.805</td>
<td>0.231</td>
<td>1.005</td>
<td>0.090</td>
<td>0.019</td>
<td>0.057</td>
<td>1.003</td>
<td>0.981</td>
<td>1.014</td>
</tr>
<tr>
<th>filter(eq0)(^0)</th>
<td>0.131</td>
<td>1.010</td>
<td>0.435</td>
<td>1.177</td>
<td>0.164</td>
<td>0.154</td>
<td>0.137</td>
<td>1.008</td>
<td>0.050</td>
<td>0.019</td>
<td>0.014</td>
<td>1.003</td>
<td>0.738</td>
<td>1.060</td>
</tr>
<tr>
<th>zip_filter</th>
<td>0.327</td>
<td>0.991</td>
<td>1.476</td>
<td>0.997</td>
<td>0.093</td>
<td>0.447</td>
<td>0.339</td>
<td>0.995</td>
<td>0.068</td>
<td>0.014</td>
<td>0.041</td>
<td>0.201</td>
<td>1.277</td>
<td>0.815</td>
</tr>
<tr>
<th>filter_zip</th>
<td>0.283</td>
<td>0.990</td>
<td>1.024</td>
<td>1.475</td>
<td>0.077</td>
<td>0.785</td>
<td>0.297</td>
<td>0.982</td>
<td>0.083</td>
<td>0.019</td>
<td>0.054</td>
<td>1.000</td>
<td>1.071</td>
<td>0.987</td>
</tr>
<tr>
<th>filter_evens</th>
<td>0.180</td>
<td>0.984</td>
<td>0.985</td>
<td>0.998</td>
<td></td>
<td></td>
<td>0.186</td>
<td>0.548</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>0.109</td>
<td>0.115</td>
</tr>
<tr>
<th>sum($a*$b)</th>
<td>0.167</td>
<td>1.475</td>
<td>0.982</td>
<td>0.978</td>
<td>0.074</td>
<td>1.115</td>
<td>0.141</td>
<td>1.476</td>
<td>0.090</td>
<td>0.941</td>
<td>0.079</td>
<td>0.955</td>
<td>0.120</td>
<td>0.642</td>
</tr>
<tr>
<th>sum($a*$b)(zip)</th>
<td>0.167</td>
<td>1.475</td>
<td>1.430</td>
<td>1.344</td>
<td></td>
<td></td>
<td>0.139</td>
<td>1.474</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>0.143</td>
<td>0.998</td>
</tr>
<tr>
<th>sum[m..n]</th>
<td>0.137</td>
<td>0.516</td>
<td>1.042</td>
<td>0.994</td>
<td>0.273</td>
<td>1.014</td>
<td>0.104</td>
<td>0.371</td>
<td>0.063</td>
<td>0.043</td>
<td>0.062</td>
<td>0.065</td>
<td>0.090</td>
<td>0.048</td>
</tr>
<tr>
<th>sumsq(map)</th>
<td>0.167</td>
<td>0.566</td>
<td>1.012</td>
<td>0.985</td>
<td>0.364</td>
<td>1.012</td>
<td>0.140</td>
<td>0.404</td>
<td>0.079</td>
<td>0.047</td>
<td>0.082</td>
<td>0.069</td>
<td>0.120</td>
<td>0.051</td>
</tr>
<tr>
<th>sumsq(zip)</th>
<td>0.088</td>
<td>0.557</td>
<td>1.009</td>
<td>0.968</td>
<td>0.115</td>
<td>1.002</td>
<td>0.165</td>
<td>1.017</td>
<td>0.153</td>
<td>0.061</td>
<td>0.107</td>
<td>0.180</td>
<td>0.272</td>
<td>0.309</td>
</tr>
<tr>
<th>sum_evens</th>
<td>0.190</td>
<td>2.114</td>
<td>0.938</td>
<td>1.041</td>
<td></td>
<td></td>
<td>0.115</td>
<td>1.379</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>0.078</td>
<td>0.053</td>
</tr>
</table>
<p><h1>Usage</h1>
</p>
<p>The current interface is rather minimalistic. After building <a href="http://hackage.haskell.org/package/NoSlow">NoSlow</a>, you get two binaries: <tt>noslow</tt> and <tt>noslow-table</tt>. The first one runs the actual benchmarks and produces a log file:</p>
<pre>
noslow -u log
</pre>
<p> The log can then be fed to <tt>noslow-table</tt> to generate HTML tables:</p>
<pre>
noslow-table log &gt; table.html
</pre>
<p>We can also restrict the output to a particular data type:</p>
<pre>
noslow-table log --type=Double &gt; table.html
</pre>
<p>or compare two logs:</p>
<pre>
noslow-table --diff log1 log2 &gt; table.html
</pre>
<p>In this case, the table will contain ratios of corresponding values from the two logs, just like in the table above.</p>
<h1>Compiling</h1>
<p><a href="http://hackage.haskell.org/package/NoSlow">NoSlow</a> should work out of the box with GHC 6.10. I haven&#8217;t tried it with 6.12 because I didn&#8217;t have time to install the release candidate and the required packages. Getting it to build with the HEAD (which is what I&#8217;m interested in) is a rather annoying process, though, because <a href="http://hackage.haskell.org/package/uvector">uvector</a> is missing a DPH patch which would make it work with the new IO system. Fortunately, I could simply transplant some code from the DPH library (the module is <tt>dph-base/Data/Array/Parallel/Data/Array/Parallel/Arr/BUArr.hs</tt>) and everything worked.</p>
<p>Also, <a href="http://hackage.haskell.org/package/storablevector">storablevector</a> depends on QuickCheck 1 which doesn&#8217;t seem to compile with the HEAD. I just removed all references to QuickCheck from the code but it might be easier to simply not build the <a href="http://hackage.haskell.org/package/storablevector">storablevector</a> backend (there is a flag for that).</p>
<h1>How it works</h1>
<p>Benchmarking itself is rather straighforward &#8211; all the exciting bits are in <a href="http://hackage.haskell.org/package/criterion">criterion</a>. The heart of <a href="http://hackage.haskell.org/package/NoSlow">NoSlow</a> is the specialiser &#8211; a <a href="http://www.haskell.org/haskellwiki/Template_Haskell">Template Haskell</a> function that takes a bunch of abstract loop kernels and specialises them for a particular backend and data type. Here is how it works. The kernels themselves are implemented in a TH quote:</p>
<pre>
kernels = [d| ...
    dotp :: (Num a, I.Vector v a) =&gt; Ty (v a) -&gt; v a -&gt; v a -&gt; a
    dotp _ as bs = named "sum($a*$b)" $ I.sum (I.zipWith (*) as bs)
    ... |]
</pre>
<p>They refer to functions from module <tt>I</tt> (short for <tt>NoName.Backend.Interface</tt>) which are just stubs and do not provide any real functionality. The specialiser traverses the ASTs of the kernels (which it has access to since they are quoted) and replaces references to those stubs with calls to functions from the backend that it is specialising for. For instance, it replaces <tt>I.sum</tt> by <tt>NoSlow.Backend.List.sum</tt> (which is just a reexport of the Prelude function) when specialising for lists.  It also adjusts signatures and looks for tags like <tt>named</tt> in the example above which defines the kernel&#8217;s name in the HTML output.</p>
<p>This scheme has one big advantage: it allows backends to mark some kernels as unsupported. An example is <a href="http://hackage.haskell.org/package/storablevector">storablevector</a> which does not support arrays of pairs. The corresponding backend defines <tt>zip</tt> like this:</p>
<pre>
zip :: Undefined
zip = Undefined
</pre>
<p>When replacing <tt>I.zip</tt> with <tt>NoSlow.Backend.StorableVector.zip</tt>, the specialiser reifies the latter, notices that its type is <tt>Unsupported</tt> and omits the kernel that contains the call to <tt>zip</tt> altogether.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/unlines.wordpress.com/204/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/unlines.wordpress.com/204/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/unlines.wordpress.com/204/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/unlines.wordpress.com/204/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/unlines.wordpress.com/204/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/unlines.wordpress.com/204/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/unlines.wordpress.com/204/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/unlines.wordpress.com/204/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/unlines.wordpress.com/204/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/unlines.wordpress.com/204/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/unlines.wordpress.com/204/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/unlines.wordpress.com/204/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/unlines.wordpress.com/204/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/unlines.wordpress.com/204/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unlines.wordpress.com&amp;blog=9686756&amp;post=204&amp;subd=unlines&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://unlines.wordpress.com/2009/11/27/noslow/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">rl</media:title>
		</media:content>
	</item>
		<item>
		<title>Tricking GHC into evaluating recursive functions at compile time</title>
		<link>http://unlines.wordpress.com/2009/11/05/tricking-ghc-into-evaluating-recursive-functions-at-compile-time/</link>
		<comments>http://unlines.wordpress.com/2009/11/05/tricking-ghc-into-evaluating-recursive-functions-at-compile-time/#comments</comments>
		<pubDate>Thu, 05 Nov 2009 03:12:19 +0000</pubDate>
		<dc:creator>Roman Leshchinskiy</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://unlines.wordpress.com/?p=153</guid>
		<description><![CDATA[Here is a trick I came up with for a project of mine. Suppose you have a GADT like this very simple one: data T a where TInt :: Int -&#62; T Int TPair :: T a -&#62; T b -&#62; T (a,b) and a function which does something with it: sumT :: T a [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unlines.wordpress.com&amp;blog=9686756&amp;post=153&amp;subd=unlines&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Here is a trick I came up with for a project of mine. Suppose you have a GADT like this very simple one:</p>
<pre>
data T a where
  TInt  :: Int -&gt; T Int
  TPair :: T a -&gt; T b -&gt; T (a,b)
</pre>
<p>and a function which does something with it:</p>
<pre>
sumT :: T a -&gt; Int
sumT (TInt n) = n
sumT (TPair l r) = sumT l + sumT r
</pre>
<p>Now, let&#8217;s use the two:</p>
<pre>
term = TPair (TPair (TInt 1) (TInt 2)) (TInt 3)
foo = sumT term
</pre>
<p>Since <tt>foo</tt> is constant, we would expect GHC to evaluate it at compile time and just bind it to 6 in the compiled code, right?<span id="more-153"></span></p>
<p>Wrong! For this to happen, GHC would have to inline <tt>sumT</tt>. But <tt>sumT</tt> is a recursive function and GHC never inlines those because it might get into an infinite loop otherwise. This means that it won&#8217;t optimise <tt>foo</tt> at all which was absolutely unacceptable in my program. I spent about two days fiddling with inline pragmas, rewrite rules and other unpleasant things until I found a satisfactory solution.</p>
<p>My first attempt was to only inline <tt>sumT</tt> if it is applied to a constructor. We could try adding a couple of rewrite rules.</p>
<pre>
"sumT/TInt"  forall n. sumT (TInt n) = n
"sumT/TPair" forall l r. sumT (TPair l r) = sumT l + sumT r
</pre>
<p>Alas, this doesn&#8217;t work most of time. Basically, trying to match on non-trivial constructors in rewrite rules is never a good idea. We could introduce &#8220;virtual&#8221; constructors, use them everywhere instead of the real ones and match on them.</p>
<pre>
tInt :: Int -&gt; T Int
{-# NOINLINE CONLIKE tInt #-}
tInt = TInt

tPair :: T a -&gt; T b -&gt; T (a,b)
{-# NOINLINE CONLIKE tPair #-}
tPair = TPair

"sumT/tInt" forall n. sumT (tInt n) = n
"sumT/tPair" forall l r. sumT (tPair l r) = sumT l + sumT r
</pre>
<p>This works much better but, unfortunately, still fails in my program. There, I make extensive use of type families so the Core generated by GHC has casts all over the place. Casts make rule matching highly unreliable because rules don&#8217;t ignore them (a particularly ugly wart that I keep running into). So what to do?</p>
<p>The solution I came up with requires adding a unit component to every recursive constructor.</p>
<pre>
data T a where
  TInt  :: Int -&gt; T Int
  TPair :: () -&gt; T a -&gt; T b -&gt; T (a,b)
</pre>
<p>Where we previously wrote <tt>TPair</tt>, we will now write <tt>TPair ()</tt>. In fact, let&#8217;s provide a convenience function for that:</p>
<pre>
tPair :: T a -&gt; T b -&gt; T (a,b)
tPair = TPair ()
</pre>
<p>Now, we define a non-recursive version of <tt>sumT</tt> which is parametrised with a function it is supposed to apply to the pair components.</p>
<pre>
sumT_cont :: (forall a. () -&gt; T a -&gt; Int) -&gt; T a -&gt; Int
{-# INLINE sumT_cont #-}
sumT_cont cont (TInt  n) = n
sumT_cont cont (TPair u l r) = cont u l + cont u r
</pre>
<p>Note that since <tt>sumT_cont</tt> isn&#8217;t recursive it can be freely inlined. Note also that we pass the unit value from the constructor to <tt>cont</tt>. This is absolutely essential.</p>
<p>The actual recursive sum is defined via <tt>sumT_cont</tt>. Of course, it has to be parametrised with <tt>()</tt> (which it ignores).</p>
<pre>
sumT' :: () -&gt; T a -&gt; Int
sumT' _ = sumT_cont sumT'

sumT :: T a -&gt; Int
{-# INLINE sumT #-}
sumT = sumT' ()
</pre>
<p>The final missing piece that makes the whole thing work is this simple rewrite rule:</p>
<pre>
"sumT'"  sumT' () = sumT_cont sumT'
</pre>
<p>It &#8220;inlines&#8221; <tt>sumT'</tt> but <i>only</i> if it is applied to <tt>()</tt>. Why is this useful? Let&#8217;s see what happens if we apply <tt>sumT</tt> to a term which GHC knows nothing about:</p>
<pre>
   sumT x
= {inline sumT}
   sumT' () x
= {apply rule "sumT'"}
   sumT_cont sumT' x
= {inline sumT_cont}
    case x of TInt n -&gt; n
              TPair u l r -&gt; sumT' u l + sumT' u r
</pre>
<p>Rewriting <tt>sumT'</tt> to <tT>sumT_cont sumT&#8217;</tt> again would be a disaster as it would put us into an infinite rewriting loop. This is precisely the reason why GHC won't inline recursive functions. But our rule doesn't match here because <tt>u</tt> is not guaranteed to be <tt>()</tt>!</p>
<p>So what happens if we apply <tt>sumT</tt> to a term that is at least partially static?</p>
<pre>
   sumT (tPair (tInt 1) y)
= {inline sumT and tPair}
   sumT' () (TPair () (TInt 1) y)
= {apply rule "sumT'"}
   sumT_cont sumT' (TPair () (TInt 1) y)
= {inline sumT_cont}
    case TPair () (TInt 1) y of TInt n -&gt; n
                                TPair u l r -&gt; sumT' u l + sumT' u r
= {eliminate case}
   sumT' () (TInt 1) + sumT' () y
= {apply rule "sumT'" twice}
    sumT_cont sumT' (TInt 1) + sumT_cont sumT' y
= {inline sumT_cont, eliminate case}
    1 + case y of TInt n -&gt; n
                  TPair u l r -&gt; sumT' u l + sumT' u r
</pre>
<p>This looks good! In effect, GHC executed <tt>sumT</tt> for the statically known portion of the term at compile time and deferred the rest to run time. This worked because when it eliminated the case on <tt>TPair</tt> it bound <tt>u</tt> in the case alternative to <tt>()</tt>. This allowed it to apply the <tt>"sumT'"</tt> rule again and thus to get rid of the <tt>TInt</tt> constructor in the left component. The right component is unknown, though, so rewriting stops there. In general, after "inlining" (via the rewrite rule) <tt>sumT'</tt> once, GHC will only apply the rule again if it eliminates the case, thus binding <tt>u</tt> to <tt>()</tt>. This, in turn, is only possible if the head of the term is a known constructor so GHC will continue rewriting and inlining until it consumes all known constructors but will not get into an infinite loop. For <tt>foo</tt> from my first example, which is fully constant, it will perform the entire computation at compile time and reduce it to 6.</p>
<p>A word of warning: it <i>is</i> possible to get GHC into an infinite loop with this approach by constructing infinite but statically known terms. For instance, we could apply the same technique to this type.</p>
<pre>
data U = UInt Int | UPair U U
</pre>
<p>But now, this term gets us into trouble:</p>
<pre>
x = UPair (UInt 1) x
</pre>
<p>This technique works best with GADTs like <tt>T</tt> that do not admit infinite terms.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/unlines.wordpress.com/153/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/unlines.wordpress.com/153/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/unlines.wordpress.com/153/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/unlines.wordpress.com/153/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/unlines.wordpress.com/153/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/unlines.wordpress.com/153/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/unlines.wordpress.com/153/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/unlines.wordpress.com/153/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/unlines.wordpress.com/153/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/unlines.wordpress.com/153/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/unlines.wordpress.com/153/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/unlines.wordpress.com/153/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/unlines.wordpress.com/153/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/unlines.wordpress.com/153/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unlines.wordpress.com&amp;blog=9686756&amp;post=153&amp;subd=unlines&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://unlines.wordpress.com/2009/11/05/tricking-ghc-into-evaluating-recursive-functions-at-compile-time/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">rl</media:title>
		</media:content>
	</item>
		<item>
		<title>Profiling with stream fusion</title>
		<link>http://unlines.wordpress.com/2009/10/30/profiling-with-stream-fusion/</link>
		<comments>http://unlines.wordpress.com/2009/10/30/profiling-with-stream-fusion/#comments</comments>
		<pubDate>Fri, 30 Oct 2009 03:13:10 +0000</pubDate>
		<dc:creator>Roman Leshchinskiy</dc:creator>
				<category><![CDATA[Haskell]]></category>

		<guid isPermaLink="false">http://unlines.wordpress.com/?p=136</guid>
		<description><![CDATA[As we move on to bigger examples in DPH, identifying performance problems just by staring at the Core output becomes somewhat difficult. We&#8217;ve finally reached a point where we actually have to profile DPH programs and identify slow loops. But how? Cost centre profiling isn&#8217;t supported by the vectoriser and in any case, it works [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unlines.wordpress.com&amp;blog=9686756&amp;post=136&amp;subd=unlines&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>As we move on to bigger examples in DPH, identifying performance problems just by staring at the Core output becomes somewhat difficult. We&#8217;ve finally reached a point where we actually have to profile DPH programs and identify slow loops. But how? <a href="http://www.haskell.org/ghc/docs/latest/html/users_guide/profiling.html#cost-centres">Cost centre profiling</a> isn&#8217;t supported by the vectoriser and in any case, it works on the Haskell source whereas we are interested in loops generated by the vectoriser. <a href="http://www.haskell.org/ghc/docs/latest/html/users_guide/ticky-ticky.html">Ticky-ticky profiling</a> happens a bit too late, when all those loops have been transformed into something mostly unrecognisable. It also seems to have problems with <tt>-threaded</tt>. Looks like we have to implement it ourselves&#8230;<span id="more-136"></span></p>
<p>With a bit of thought, this turns out to be surprisingly easy. Since DPH uses stream fusion, (almost) all those loops operate on streams. Now, the basic idea is to annotate all streams with a string which tells us how the stream has been produced.</p>
<pre>
data Stream a = forall s. Stream ... String

replicateS :: Int -&gt; a -&gt; Stream a
replicateS n x = Stream ... "replicateS"

mapS :: (a -&gt; b) -&gt; Stream a -&gt; Stream b
mapS f (Stream ... c) = Stream ... ("mapS &lt;- " ++ c)
</pre>
<p>and so on. Now, when a real loop consumes a stream it simply logs its running time together with the stream&#8217;s producer:</p>
<pre>
unstreamU :: Stream a -&gt; UArr a
unstreamU (Stream ... c) = runST (traceLoopST ("unstreamU &lt;- &quot; ++ c) $ fill)
  where
    fill =
</pre>
<p>Since streams are purely compile-time structures and no streams should be left in the program if the simplifier is doing its job properly, all those strings simply go away if we compile our library without tracing  (i.e., make <tt>traceLoopST</tt> a noop). With tracing on, however, our rather fancy implementation actually produces <a href="http://en.wikipedia.org/wiki/DTrace">dtrace</a> events at the beginning and at the end of a loop. This is a very flexible mechanism. In particular, it allows us to produce profiles like this (pardon my formatting):</p>
<pre>
  ...
  unstreamMU &lt;- zipWithS &lt;- (zipWithS &lt;- (zipWithS &lt;- (replicateEachS &lt;- zipWithS &lt;- (streamU, streamU), streamU), zipWithS &lt;- (replicateEachS &lt;- zipWithS &lt;- (streamU, streamU), streamU)), zipWithS &lt;- (zipWithS &lt;- (replicateEachS &lt;- zipWithS &lt;- (streamU, st         12115903
  unstreamMU &lt;- mapS &lt;- mapS &lt;- streamU                      21787693
  foldS &lt;- streamU                                           27096786
  unstreamMU &lt;- zipWithS &lt;- (streamU, mapS &lt;- fold1SS &lt;- (streamU, zipWithS &lt;- (enumFromToEachS &lt;- zipWithS &lt;- (replicateS, mapS &lt;- streamU), streamU)))         60756439
  unstreamMU &lt;- toStream                                    817568512
</pre>
<p>This shows the raw running time for each kind of stream loop in the program. Of course, it doesn&#8217;t distinguish between different instantiations of the same loop structure (for instance, <tt>foldS &lt;- streamU</tt> occurs rather frequently) but it&#8217;s still very helpful and doesn&#8217;t require any compiler support whatsoever. Ignoring <tt>toStream</tt> which is only called during test data generation, the loop with <tt>fold1SS</tt> takes by far the most time. Indeed, it turns out that <tt>fold1SS</tt> is the problem; optimising its implementation speeds up the program almost by a factor of 2.</p>
<p>Implementing all this took less than one day. I guess this shows yet again: don&#8217;t mess with the compiler if you can mess with the library instead. Also, use Haskell.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/unlines.wordpress.com/136/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/unlines.wordpress.com/136/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/unlines.wordpress.com/136/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/unlines.wordpress.com/136/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/unlines.wordpress.com/136/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/unlines.wordpress.com/136/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/unlines.wordpress.com/136/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/unlines.wordpress.com/136/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/unlines.wordpress.com/136/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/unlines.wordpress.com/136/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/unlines.wordpress.com/136/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/unlines.wordpress.com/136/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/unlines.wordpress.com/136/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/unlines.wordpress.com/136/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unlines.wordpress.com&amp;blog=9686756&amp;post=136&amp;subd=unlines&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://unlines.wordpress.com/2009/10/30/profiling-with-stream-fusion/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">rl</media:title>
		</media:content>
	</item>
		<item>
		<title>Talk on &#8220;Loop Fusion in Haskell&#8221;</title>
		<link>http://unlines.wordpress.com/2009/10/22/talk-on-loop-fusion-in-haskell/</link>
		<comments>http://unlines.wordpress.com/2009/10/22/talk-on-loop-fusion-in-haskell/#comments</comments>
		<pubDate>Thu, 22 Oct 2009 12:50:18 +0000</pubDate>
		<dc:creator>Roman Leshchinskiy</dc:creator>
				<category><![CDATA[Haskell]]></category>

		<guid isPermaLink="false">http://unlines.wordpress.com/?p=132</guid>
		<description><![CDATA[I gave a talk about loop fusion in Haskell today at FP-Syd, the Sydney Functional Programming group. It covered stream fusion and fusion for distributed types which are two of the optimisations that make Data Parallel Haskell fast.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unlines.wordpress.com&amp;blog=9686756&amp;post=132&amp;subd=unlines&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I gave a <a href="http://www.cse.unsw.edu.au/~rl/talks/fp-syd-fusion.pdf">talk about loop fusion</a> in Haskell today at FP-Syd, the Sydney Functional Programming group. It covered stream fusion and fusion for distributed types which are two of the optimisations that make <a href="http://www.haskell.org/haskellwiki/GHC/Data_Parallel_Haskell">Data Parallel Haskell</a> fast.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/unlines.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/unlines.wordpress.com/132/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/unlines.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/unlines.wordpress.com/132/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/unlines.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/unlines.wordpress.com/132/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/unlines.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/unlines.wordpress.com/132/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/unlines.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/unlines.wordpress.com/132/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/unlines.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/unlines.wordpress.com/132/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/unlines.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/unlines.wordpress.com/132/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unlines.wordpress.com&amp;blog=9686756&amp;post=132&amp;subd=unlines&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://unlines.wordpress.com/2009/10/22/talk-on-loop-fusion-in-haskell/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">rl</media:title>
		</media:content>
	</item>
		<item>
		<title>Talk on &#8220;Generics in Data Parallel Haskell&#8221;</title>
		<link>http://unlines.wordpress.com/2009/10/03/talk-on-generics-in-data-parallel-haskell/</link>
		<comments>http://unlines.wordpress.com/2009/10/03/talk-on-generics-in-data-parallel-haskell/#comments</comments>
		<pubDate>Sat, 03 Oct 2009 02:08:16 +0000</pubDate>
		<dc:creator>Roman Leshchinskiy</dc:creator>
				<category><![CDATA[Haskell]]></category>

		<guid isPermaLink="false">http://unlines.wordpress.com/?p=79</guid>
		<description><![CDATA[Here are the slides for my talk on the use of generics in the implementation of Data Parallel Haskell at the SAPLING meeting yesterday. This is where Instant Generics come from.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unlines.wordpress.com&amp;blog=9686756&amp;post=79&amp;subd=unlines&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Here are the <a href="http://www.cse.unsw.edu.au/~rl/talks/sapling-oct09.pdf">slides</a> for my talk on the use of generics in the implementation of <a href="http://www.haskell.org/haskellwiki/GHC/Data_Parallel_Haskell">Data Parallel Haskell</a> at the <a href="http://plrg.science.mq.edu.au/projects/show/sapling">SAPLING</a> meeting yesterday. This is where <a href="http://www.cse.unsw.edu.au/~chak/project/generics">Instant Generics</a> come from.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/unlines.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/unlines.wordpress.com/79/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/unlines.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/unlines.wordpress.com/79/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/unlines.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/unlines.wordpress.com/79/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/unlines.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/unlines.wordpress.com/79/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/unlines.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/unlines.wordpress.com/79/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/unlines.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/unlines.wordpress.com/79/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/unlines.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/unlines.wordpress.com/79/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unlines.wordpress.com&amp;blog=9686756&amp;post=79&amp;subd=unlines&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://unlines.wordpress.com/2009/10/03/talk-on-generics-in-data-parallel-haskell/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">rl</media:title>
		</media:content>
	</item>
		<item>
		<title>Squinting at fusion</title>
		<link>http://unlines.wordpress.com/2009/09/29/squinting-at-fusion/</link>
		<comments>http://unlines.wordpress.com/2009/09/29/squinting-at-fusion/#comments</comments>
		<pubDate>Tue, 29 Sep 2009 06:09:07 +0000</pubDate>
		<dc:creator>Roman Leshchinskiy</dc:creator>
				<category><![CDATA[Haskell]]></category>

		<guid isPermaLink="false">http://unlines.wordpress.com/?p=5</guid>
		<description><![CDATA[This being my first blog post and all, I&#8217;ll try to maximise boredom and minimise readability by writing as few lines of text as possible. Here we go&#8230; As we know, recursive data types are fixpoints of non-recursive ones. So, for instance, the standard list data type: data [a] = [] &#124; a : [a] [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unlines.wordpress.com&amp;blog=9686756&amp;post=5&amp;subd=unlines&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This being my first blog post and all, I&#8217;ll try to maximise boredom and minimise readability by writing as few lines of text as possible. Here we go&#8230;<span id="more-5"></span></p>
<p>As we know, recursive data types are fixpoints of non-recursive ones. So, for instance, the standard list data type:</p>
<pre>data [a] = [] | a : [a]</pre>
<p>is just the fixpoint of this guy:</p>
<pre>
data PreList a s = Nil | Cons a s
</pre>
<p>with these simple injection/projection functions:</p>
<pre>
inject :: PreList a [a] -&gt; [a]
inject Nil         = []
inject (Cons x xs) = x : xs

project :: [a] -&gt; PreList a [a]
project []     = Nil
project (x:xs) = Cons x xs
</pre>
<p>Of course, <tt>PreList</tt> is also a functor:</p>
<pre>
instance Functor (PreList a) where
  fmap f Nil        = Nil
  fmap f (Cons x s) = Cons x (f s)
</pre>
<p>Now, our goal is to mutilate the standard <a href="http://www.haskell.org/haskellwiki/Correctness_of_short_cut_fusion">short cut fusion</a> rules by using <tt>PreList</tt> as much as possible. Let&#8217;s do <tt>destroy/unfoldr</tt> first:</p>
<pre>
destroy :: (forall s. (s -&gt; PreList a s) -&gt; s -&gt; t) -&gt; [a] -&gt; t
destroy g = g project

unfoldr :: (s -&gt; PreList a s) -&gt; s -&gt; [a]
unfoldr f = inject . fmap (unfoldr f) . f
</pre>
<p>The fusion rule is</p>
<pre>
destroy g (unfoldr f s) = g f s
</pre>
<p>Of course, <tt>unfoldr</tt> is just the list anamorphism but what&#8217;s so special about <tt>destroy</tt>? If we squint hard enough at its type we might realise that</p>
<pre>
forall t. (forall s. (s -&gt; PreList a s) -&gt; s -&gt; t) -&gt; t
</pre>
<p>is, in fact, isomorphic to</p>
<pre>
exists s. (s -&gt; PreList a s, s)
</pre>
<p>This being Haskell, we have to introduce a separate data type for the existential:</p>
<pre>
data Unfolding a = forall s. Unfolding (s -&gt; PreList a s) s
</pre>
<p>This makes the signatures of our two functions a lot simpler:</p>
<pre>
destroy :: [a] -&gt; Unfolding a
destroy xs = Unfolding project xs

unfoldr :: Unfolding a -&gt; [a]
unfoldr (Unfolding f s) = ana s
  where
     ana = inject . fmap ana . f
</pre>
<p>And the fusion rule is a bit nicer, too:</p>
<pre>
destroy (unfoldr s) = s
</pre>
<p>In fact, what we have here is almost but not quite <a href="http://www.cse.unsw.edu.au/~rl/publications/stream-fusion.html">stream fusion</a>: <tt>destroy</tt> is equivalent to <tt>stream</tt>, <tt>unfoldr</tt> to <tt>unstream</tt> and <tt>Unfolding</tt> to <tt>Stream</tt>. The only difference is that <tt>PreList</tt> (which corresponds to the <tt>Step</tt> data type in the paper) is missing the <tt>Skip</tt> constructor which, unfortunately, is crucial for making the whole thing work.</p>
<p>This part was clear to me back when we were doing stream fusion but the next bit I didn&#8217;t understand until recently. The question is: can we do something similar with <tt>foldr/build</tt>? Again, let&#8217;s get rid of as many funny types as possible and use <tt>PreList</tt> instead:</p>
<pre>
foldr :: (PreList a s -&gt; s) -&gt; [a] -&gt; s
foldr f = f . fmap (foldr f) . project

build :: (forall s. (PreList a s -&gt; s) -&gt; s) -&gt; [a]
build g = g inject
</pre>
<p>If you are wondering where these signatures come from, here is a little hint:</p>
<pre>
(a -&gt; s -&gt; s) -&gt; s -&gt; t   ~   ((a,s) -&gt; s) -&gt; (() -&gt; s) -&gt; t    ~   ((a,s)+() -&gt; s) -&gt; t   ~   (PreList a s -&gt; s) -&gt; t
</pre>
<p>Now, by squinting at the types we might just see that it makes sense to introduce an abstraction:</p>
<pre>
data Folding a = Folding (forall s. (PreList a s -&gt; s) -&gt; s)
</pre>
<p>and use it:</p>
<pre>
foldr :: [a] -&gt; Folding a
foldr xs = Folding (\f -&gt; let cata = f . fmap cata . project in cata xs)

build :: Folding a -&gt; [a]
build (Folding g) = g inject
</pre>
<p>Voilà! We now have a shiny new <tt>foldr/build</tt> fusion rule:</p>
<pre>foldr (build s) = s</pre>
<p>It&#8217;s probably worth pointing out that <tt>Folding</tt> does, in fact, support many useful list operations. Here is an example:</p>
<pre>
instance Functor Folding where
  fmap f (Folding g) = Folding (\h -&gt; g (h . emap))
    where
      emap Nil        = Nil
      emap (Cons x s) = Cons (f x) s
</pre>
<p>So is there a point to all this? Well, I find it quite interesting that <tt>foldr/build</tt> fusion can be rewritten in this way. I&#8217;m even more intrigued by what we get if we squint at the list hylomorphism:</p>
<pre>
refold :: Unfolding a -&gt; Folding a
refold (Unfolding g s) = Folding (\f -&gt; let hylo = f . fmap hylo . g in  hylo s)
</pre>
<p>Could we have both stream fusion and <tt>foldr/build</tt> in one framework? Would that be useful? Is there a way of working on fusion without irreparably damaging my eyes?</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/unlines.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/unlines.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/unlines.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/unlines.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/unlines.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/unlines.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/unlines.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/unlines.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/unlines.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/unlines.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/unlines.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/unlines.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/unlines.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/unlines.wordpress.com/5/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unlines.wordpress.com&amp;blog=9686756&amp;post=5&amp;subd=unlines&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://unlines.wordpress.com/2009/09/29/squinting-at-fusion/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">rl</media:title>
		</media:content>
	</item>
	</channel>
</rss>
