I gave a talk about loop fusion in Haskell today at FP-Syd, the Sydney Functional Programming group. It covered stream fusion and fusion for distributed types which are two of the optimisations that make Data Parallel Haskell fast.
Very nice explanation, and cool stuff. However, streams are very list-like, and since Guy Steele’s talk at ICFP and good experiences with alternatives, I’m growing a bit allergic to list-like structures.
Did you consider alternatives? For example:
data Step s a = Yield a | Skip | Append s s
@Sjoerd Visscher: The reason for the remaining bit of list-like appearance to the streams is that you want to compile the operations down to plain iterative loops, that is, only involving linear recursion, because it has an efficient implementation. For all the other problems they might have when abused, lists are the data structure version of loops: a loop either doesn’t occur or it has a first iteration followed by another loop, just as a list is either empty or consists of an element followed by another list.
I’m wondering if this isn’t arguing in circles: aren’t iterative loops are just as much another representation of the same issue of doing everything with lists?
I think it depends on the way you are generating the data. If you can generate it iteratively then, yes, a loop is an efficient way of implementing that. But if generating the data has a tree shape, you need tricks to get that in a list-like stream, and these tricks probably kill your efficiency.
With the above Step data-type, you can still generate a list-like stream: an Append with the first seed generating a Yield, and the second seed generating the next iteration. It should then still compile to a loop.
But when you need more than a loop, you can do that, and my guess is that it will generate more efficient code than when you have to use tricks to get it to work with an iterative stream.
Originally, stream fusion was developed to solve a very specific problem: how to optimise purely sequential loops on arrays. Note that even when compiling parallel code for, say, multicores you still get sequential loops for the individual cores after you’ve chopped up the work (see, for instance, the distributed types approach from the second part of the talk).
As to trees, it’s not at all clear if a stream-fusion like approach works for them. The problem is that we only get efficient code if (a) stream transformers are not recursive and (b) the type of the seed is not recursive. For list-like streams, the Skip constructor makes this possible. I don’t know how to do something like this for trees but my guess is that it would require some kind of manual flattening. In that case, it is (or will be) easier to use DPH which does the flattening automatically (and, of course, relies on stream fusion rather heavily).
Fill in your details below or click an icon to log in:
You are commenting using your WordPress.com account. ( Log Out / Change )
You are commenting using your Twitter account. ( Log Out / Change )
You are commenting using your Facebook account. ( Log Out / Change )
You are commenting using your Google+ account. ( Log Out / Change )
Connecting to %s
Notify me of follow-up comments via email.
Notify me of new posts via email.
Enter your email address to subscribe to this blog and receive notifications of new posts by email.
Join 2 other followers
Create a free website or blog at WordPress.com.
The Vigilance Theme.
Get every new post delivered to your Inbox.