Unboxed strict tuples

Motivation

Having the ability to communicate to that a returned or passed value is already evaluated is quite valuable.

Currently there are essentially 3 ways to pass and return values, which can be neatly split along two axis.

This leaves us without the ability to express one obvious combination: Unlifted unboxed values.


                   *1             *1
             Lifted       Unlifted*2
             Fields       Fields

          +------------+------------------+
          |            |                  |
          |  data T a  |     data T       |
Boxed     |    = T a   |       = T !a     |
          |            |                  |
          |  ( a )     |     Force a      |
          |            |                  |
          +-------------------------------+
          |            |  This proposal:  |
          |            |                  |
          |  (# a #)   |    (#! a !#)     |
          |            |                  |
          |            |    alternative   |
Unboxed   |            |                  |
          |            |  data unboxed T  |
          |            |    = T !a        |
          |            |                  |
          +------------+------------------+

     *1) For fields whose base types are lifted.
     *2) As it stands only WHNF, but this is an implementation detail
         likely to change

Avoiding space leaks

Strict fields are an easy way to reduce the risk for accidental space leaks. This is already possible for regular data types and commonly used.

However when the goal is to return multiple values without the overhead of passing them in a constructor we lose this ability as unboxed tuples do not support automatic strictness.

Instead users need to manually force the results. Opening up a venue for introducing space leaks.

One example I ran into recently was performance issues caused by a Lazy State Monad. Caused by accumulating thunks inside the State. However it was implemented with an unboxed tuple so the only way to make sure it was not accumulating thunks was to find all places updating the state and forcing their results.

Passing arguments in WHNF

Additionally strict fields allow us to shift the responsibility of evaluating arguments to the call site.
This is currently NOT possible without risking additional overhead from a constructor being allocated.

This is especially desirable when dealing with recursive functions. For example take this function from #16820 (closed)

data Numbers = NumbersCons Int# !Numbers | NumbersNil

totalInner :: Int# -> Numbers -> Int#
totalInner acc (#! NumbersCons n ns !#) = totalInner (acc +# n) ns
totalInner acc (#! NumbersNil #!) = acc

In the recursive case we call totalInner with the content of a strict field so there is no need for evaluation. However other callsites might call totalInner with unevaluated values.

This means we are force to check if the argument is evaluated in each iteration. Using this proposal we could redefine the function like this:

totalInner :: Int# -> (#! Numbers #!) -> Int#
totalInner acc (NumbersCons n ns) = totalInner (acc +# n) (#! ns !#)
totalInner acc NumbersNil = acc

This allows us to eliminate the tag check for the second argument in all cases inside the function. Instead only paying that cost at external call sites. In the presence of recursion usually a good tradeoff.

However this requires us to make sure strict fields are always tagged. Something not currently the case.

Proposal

Add strict unboxed tuples.
These are the equivalent of a unboxed tuple with strict fields.

So a user might write:

foo x = ... -- Defines r1 to rn
    -> (#! r1', r2', rn' !#)

Which has the same semantics as:

foo x = ... -- Defines r1 to rn
    case r1 of
        r1' -> case r2 of
            r2' -> ...
                rn' -> (# r1', r2', rn' #)

Mixing lazy and strict return values.

This would be achieved by nesting the two types of unboxed tuples.

So in order to return one strict and two lazy values we would write:

`(#! strict , (# lazy1, lazy2 #) !#)

What differentiates this from unlifted data types.

There is a proposal solving most of the same problems: Proposal for unlifted data types (Proposal 1)

Really the only major difference is that unlifted data types require naming our return type.

Both can express the same idea, for example a strict state Monad might be better of using unlifted data types:


-- Strict Tuples
newtype State s a = State { runState' :: s -> (#! (# a #), s !#) }

-- Unlifted Data Type
data unlifted UStateResult a s = UStateResult a !s

newtype State s a = State { runState' :: s -> UStateResult a s }

Mostly because we will match on UStateResult in a few places and the syntax is nicer in this case.

However even without unlifted data types, we can simulate them on top of strict tuples.

data unlifted UniqResult result = UniqResult result !UniqSupply

-- Is equal to

type UniqResult result = (#! (# result #), UniqSupply !#)

pattern UniqResult :: a -> b -> (#! (# a #), b !#)
pattern UniqResult x y = (#! (# x #), y !#)

However the inverse is not true. There is no way to get unnamed tuples without compiler support. Even in the presence of unlifted data types.

This means to solve the Problem in #16820 (closed) we would have to redefine the function like this instead:

data Numbers = NumbersCons Int# !Numbers | NumbersNil

data UNumbers = UNumbers !Numbers

totalInner :: Int# -> UNumbers -> Int#
totalInner acc (UNumbers (NumbersCons n ns )) = totalInner (acc +# n) (UNumbers ns)
totalInner acc (UNumbers NumbersNil ) = acc

I think unboxed strict tuples are the better solution in cases like this.

GHC-Proposal

This is just so this doesn't fall by the wayside. Any adoption of this would require a proper ghc proposal (and it's acceptance).

Edited Aug 02, 2019 by Andreas Klebinger

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information