Recently, I decided to clean up and release a small library which I hacked together several months ago and then all but forgot about. I find it quite amusing; perhaps you will, too.
The library implements a framework for computing with small, fixed-size vectors such as complex numbers or coordinates. My goal was to be as generic and efficient as possible. In particular, it should be easy to generically define common functions such as dot product or magnitude for vectors of arbitrary arity and to add new vector types and operations. Equally importantly, there shouldn’t be any run-time overhead – all operations should be as fast as if they were written by hand.
Over the last couple of days, I have implemented a small benchmark suite which tries to measure the performance of various Haskell array libraries, with particular emphasis on finding out how well they are able to fuse things. It is now on Hackage under the very creative and imaginative name NoSlow (Haskell seems to have gained a tradition of naming benchmark suites nosomething). What it does is compile and run a set of micro-benchmarks using these libraries: Read more…
Here is a trick I came up with for a project of mine. Suppose you have a GADT like this very simple one:
data T a where TInt :: Int -> T Int TPair :: T a -> T b -> T (a,b)
and a function which does something with it:
sumT :: T a -> Int sumT (TInt n) = n sumT (TPair l r) = sumT l + sumT r
Now, let’s use the two:
term = TPair (TPair (TInt 1) (TInt 2)) (TInt 3) foo = sumT term
Since foo is constant, we would expect GHC to evaluate it at compile time and just bind it to 6 in the compiled code, right? Read more…
As we move on to bigger examples in DPH, identifying performance problems just by staring at the Core output becomes somewhat difficult. We’ve finally reached a point where we actually have to profile DPH programs and identify slow loops. But how? Cost centre profiling isn’t supported by the vectoriser and in any case, it works on the Haskell source whereas we are interested in loops generated by the vectoriser. Ticky-ticky profiling happens a bit too late, when all those loops have been transformed into something mostly unrecognisable. It also seems to have problems with -threaded. Looks like we have to implement it ourselves… Read more…