Skip to content

Profiling with stream fusion

October 30, 2009

As we move on to bigger examples in DPH, identifying performance problems just by staring at the Core output becomes somewhat difficult. We’ve finally reached a point where we actually have to profile DPH programs and identify slow loops. But how? Cost centre profiling isn’t supported by the vectoriser and in any case, it works on the Haskell source whereas we are interested in loops generated by the vectoriser. Ticky-ticky profiling happens a bit too late, when all those loops have been transformed into something mostly unrecognisable. It also seems to have problems with -threaded. Looks like we have to implement it ourselves…

With a bit of thought, this turns out to be surprisingly easy. Since DPH uses stream fusion, (almost) all those loops operate on streams. Now, the basic idea is to annotate all streams with a string which tells us how the stream has been produced.

data Stream a = forall s. Stream ... String

replicateS :: Int -> a -> Stream a
replicateS n x = Stream ... "replicateS"

mapS :: (a -> b) -> Stream a -> Stream b
mapS f (Stream ... c) = Stream ... ("mapS <- " ++ c)

and so on. Now, when a real loop consumes a stream it simply logs its running time together with the stream’s producer:

unstreamU :: Stream a -> UArr a
unstreamU (Stream ... c) = runST (traceLoopST ("unstreamU <- " ++ c) $ fill)
    fill = 

Since streams are purely compile-time structures and no streams should be left in the program if the simplifier is doing its job properly, all those strings simply go away if we compile our library without tracing (i.e., make traceLoopST a noop). With tracing on, however, our rather fancy implementation actually produces dtrace events at the beginning and at the end of a loop. This is a very flexible mechanism. In particular, it allows us to produce profiles like this (pardon my formatting):

  unstreamMU <- zipWithS <- (zipWithS <- (zipWithS <- (replicateEachS <- zipWithS <- (streamU, streamU), streamU), zipWithS <- (replicateEachS <- zipWithS <- (streamU, streamU), streamU)), zipWithS <- (zipWithS <- (replicateEachS <- zipWithS <- (streamU, st         12115903
  unstreamMU <- mapS <- mapS <- streamU                      21787693
  foldS <- streamU                                           27096786
  unstreamMU <- zipWithS <- (streamU, mapS <- fold1SS <- (streamU, zipWithS <- (enumFromToEachS <- zipWithS <- (replicateS, mapS <- streamU), streamU)))         60756439
  unstreamMU <- toStream                                    817568512

This shows the raw running time for each kind of stream loop in the program. Of course, it doesn’t distinguish between different instantiations of the same loop structure (for instance, foldS <- streamU occurs rather frequently) but it’s still very helpful and doesn’t require any compiler support whatsoever. Ignoring toStream which is only called during test data generation, the loop with fold1SS takes by far the most time. Indeed, it turns out that fold1SS is the problem; optimising its implementation speeds up the program almost by a factor of 2.

Implementing all this took less than one day. I guess this shows yet again: don’t mess with the compiler if you can mess with the library instead. Also, use Haskell.

2 Comments leave one →
  1. nickzivkovic permalink
    November 15, 2009 10:11

    /This/ is awesome. You deserve a medal, for creating these DTrace providers. You should definitely consider doing more of this kind of stuff. Where can I get the source of DPH/GHC that has these capabilities?

    BTW, are you developing this on a Mac, a Solaris, or FreeBSD? (or some other system with dtrace goodness)?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: