Squinting at fusion
This being my first blog post and all, I’ll try to maximise boredom and minimise readability by writing as few lines of text as possible. Here we go…
As we know, recursive data types are fixpoints of non-recursive ones. So, for instance, the standard list data type:
data [a] = [] | a : [a]
is just the fixpoint of this guy:
data PreList a s = Nil | Cons a s
with these simple injection/projection functions:
inject :: PreList a [a] -> [a] inject Nil = [] inject (Cons x xs) = x : xs project :: [a] -> PreList a [a] project [] = Nil project (x:xs) = Cons x xs
Of course, PreList is also a functor:
instance Functor (PreList a) where fmap f Nil = Nil fmap f (Cons x s) = Cons x (f s)
Now, our goal is to mutilate the standard short cut fusion rules by using PreList as much as possible. Let’s do destroy/unfoldr first:
destroy :: (forall s. (s -> PreList a s) -> s -> t) -> [a] -> t destroy g = g project unfoldr :: (s -> PreList a s) -> s -> [a] unfoldr f = inject . fmap (unfoldr f) . f
The fusion rule is
destroy g (unfoldr f s) = g f s
Of course, unfoldr is just the list anamorphism but what’s so special about destroy? If we squint hard enough at its type we might realise that
forall t. (forall s. (s -> PreList a s) -> s -> t) -> t
is, in fact, isomorphic to
exists s. (s -> PreList a s, s)
This being Haskell, we have to introduce a separate data type for the existential:
data Unfolding a = forall s. Unfolding (s -> PreList a s) s
This makes the signatures of our two functions a lot simpler:
destroy :: [a] -> Unfolding a destroy xs = Unfolding project xs unfoldr :: Unfolding a -> [a] unfoldr (Unfolding f s) = ana s where ana = inject . fmap ana . f
And the fusion rule is a bit nicer, too:
destroy (unfoldr s) = s
In fact, what we have here is almost but not quite stream fusion: destroy is equivalent to stream, unfoldr to unstream and Unfolding to Stream. The only difference is that PreList (which corresponds to the Step data type in the paper) is missing the Skip constructor which, unfortunately, is crucial for making the whole thing work.
This part was clear to me back when we were doing stream fusion but the next bit I didn’t understand until recently. The question is: can we do something similar with foldr/build? Again, let’s get rid of as many funny types as possible and use PreList instead:
foldr :: (PreList a s -> s) -> [a] -> s foldr f = f . fmap (foldr f) . project build :: (forall s. (PreList a s -> s) -> s) -> [a] build g = g inject
If you are wondering where these signatures come from, here is a little hint:
(a -> s -> s) -> s -> t ~ ((a,s) -> s) -> (() -> s) -> t ~ ((a,s)+() -> s) -> t ~ (PreList a s -> s) -> t
Now, by squinting at the types we might just see that it makes sense to introduce an abstraction:
data Folding a = Folding (forall s. (PreList a s -> s) -> s)
and use it:
foldr :: [a] -> Folding a foldr xs = Folding (\f -> let cata = f . fmap cata . project in cata xs) build :: Folding a -> [a] build (Folding g) = g inject
VoilĂ ! We now have a shiny new foldr/build fusion rule:
foldr (build s) = s
It’s probably worth pointing out that Folding does, in fact, support many useful list operations. Here is an example:
instance Functor Folding where fmap f (Folding g) = Folding (\h -> g (h . emap)) where emap Nil = Nil emap (Cons x s) = Cons (f x) s
So is there a point to all this? Well, I find it quite interesting that foldr/build fusion can be rewritten in this way. I’m even more intrigued by what we get if we squint at the list hylomorphism:
refold :: Unfolding a -> Folding a refold (Unfolding g s) = Folding (\f -> let hylo = f . fmap hylo . g in hylo s)
Could we have both stream fusion and foldr/build in one framework? Would that be useful? Is there a way of working on fusion without irreparably damaging my eyes?
Could we have both stream fusion and foldr/build in one framework?
I don’t see any particular reason why not. I mean, both destroy/unfold and foldr/build are performing the same game, the only difference is in their duality. The real trick is the problem of interoperability, i.e. choosing to use Folding or Unfolding representations and when to switch. Most fusion frameworks try to avoid the conversion between them, and so they pick one or the other as the canonical representation. I think the correct way forward would be to have the fused structure take hylomorphic form. Both anamorphisms and catamorphisms are simplifications of hylomorphisms, and the root of fusion is turning them into their underlying hylomorphisms. So far, I don’t know of any system which takes the hylo-first approach though.
Of course, both varieties of fusion can be applied to any recursive type, not just lists. (And I’ve written up a type class for doing generic ana/build fusion for arbitrary mu-recursive types.) Lists are just special because they’re linear chains and so certain operations can be fused for lists which can’t be fused in general (i.e. in a manner agnostic to the particular shape of the recursive type). The prime example here is fusing tail concatenation (which GHC calls “augment”). Augmentation only works for lists because a) there is only a single tail due to non-branchiness, and b) there is only one base case for how that tail can terminate. If we relax either constraint then the single augment function turns into many functions depending on where and what we augment in the structure (or alternatively, the type of the augment function can no longer be generic in the fusion form of the open-recursive functor).