Christiaan Baaij (6a7ed992) at 19 Apr 17:45
Add the OPAQUE pragma
Christiaan Baaij (72232014) at 19 Apr 17:43
Add Haddock support for the OPAQUE pragma
Question for eventual reviewers: do I need to update this list of pragmas https://gitlab.haskell.org/ghc/nofib/-/blob/master/gc/happy/TestInput.y#L242-251 in the nofib suite with the new OPAQUE pragma?
Implements the OPAQUE
pragma described in https://github.com/ghc-proposals/ghc-proposals/pull/415 which was brought on by discussions in #19553
This is work in progress:
#NNNN
syntax when
appropriate)Progress:
OPAQUE
supported by lexer and parserOPAQUE
binders are not specialized: tags along with existing logic for NOINLINE
binders by setting the activation to NeverActive
for OPAQUE
binders.OPAQUE
binders are not Cast-W/W transformedOPAQUE
binders are not regular-W/W transformedOPAQUE
binders know that while it may use its arguments (head-)strict, they will not be unboxedChristiaan Baaij (846ab56e) at 18 Apr 13:58
Add the OPAQUE pragma
I've created a first version of the proposal at: https://github.com/ghc-proposals/ghc-proposals/pull/415
Yes, OPAGUE
is totally what we want. Is that something that will need to go through the ghc-proposals process?
I'm also willing to implement a -fno-pre-worker-wrapper
flag that disables the cast W/W transformation in the early stage simplifier. That way, regular -O0
users would not be affected, and only people "who really know what they're doing" can disable the cast W/W.
Clash compiles everything with -O1 -fno-worker-wrapper -fexpose-all-unfoldings
. We also recommend users of Clash to add those settings to their ghc-options
field of their .cabal
file when they decide to turn their projects into a reusable library and compile with regular GHC. That's because Clash also supports users adding their own "primitives" using GHCs ANN
pragma's, which similarly depends on GHC not inlining those functions ever.
That won’t work well either. The templates for the Clash “primitives” allow one to query the bit size of the result type, where in the case of
times# :: Signed m -> Signed n -> Signed (n+m)
that would be the result of n+m
, while for:
times# :: Signed m -> Signed n -> Integer
Clash always returns 64
.
Clash' base/prelude library has multiple functions where the Clash compiler should not translate their Haskell definitions to HDL, but should use a hard-coded translation. An example of such a function is multiplication for Clash' arbitrary-sized signed number type (see https://github.com/clash-lang/clash-compiler/blob/ab6dd31dc61fc96bb058e57dc117db6f5626d3a9/clash-prelude/src/Clash/Sized/Internal/Signed.hs):
module Clash.Sized.Internal.Signed where
newtype Signed (n :: Nat) = S { unsafeToInteger :: Integer}
{-# NOINLINE times# #-}
times# :: Signed m -> Signed n -> Signed (m + n)
times# (S a) (S b) = S (a * b)
The way that Clash knows it should pick a hard-coded translation for that times#
function is because it reads files such as https://github.com/clash-lang/clash-compiler/blob/ab6dd31dc61fc96bb058e57dc117db6f5626d3a9/clash-lib/prims/vhdl/Clash_Sized_Internal_Signed.primitives which contain entries such as:
{ "BlackBox" :
{ "name" : "Clash.Sized.Internal.Signed.times#"
, "kind" : "Expression"
, "type" : "times# :: Signed m -> Signed n -> Signed (m + n)"
, "template" : "~IF~AND[~SIZE[~TYP[0]],~SIZE[~TYP[1]]]~THEN~ARG[0] * ~ARG[1]~ELSEsigned'(~SIZE[~TYPO]-1 downto 0 => '0')~FI"
}
}
which tells it whenever it sees an applications of Clash.Sized.Internal.Signed.times#
, it should use the string that's given by the template
entry, as opposed to looking at the definition of Clash.Sized.Internal.Signed.times#
and translating that.
All of this critically depends on GHC preserving applications of that Clash.Sized.Internal.Signed.times#
function, and not inlining the functions. What's happening with the cast W/W is that we basically end up with:
times# :: Signed m -> Signed n -> Signed (m + n)
times# = times#1 |> coercion
times#1 :: Signed m -> Signed n -> Integer
times#1 (S a) (S b) = a * b
Where Clash has no hard-coded translation for times#1
(nor should it, as that is super fragile), and will fall back to trying to translate the defintion of times#1
to HDL (which in this case means trying to translate Integer
multiplication which Clash always erroneously translates to 64-bit multiplication).
The cast W/W didn't affect Clash before 6d49d5be because the wrapper (times#
in the above case) still had a NOINLINE, and Clash would only ever see applications of the wrapper.
All of this is now framed in the Clash use case, but I imagine that any GHC API user that needs/wants to recognize applications of certain functions (and doesn't want those functions inlined) will be similarly affected by the cast W/W after 6d49d5be.
I realize that cast W/W was always enabled regardless of optimization level, and that -fno-worker-wrapper
currently only affects/disables "regular" W/W during -O1 and -O2 runs. i.e. -O0
implies -fno-worker-wrapper
which means that if cast W/W would be disabled by -fno-worker-wrapper
it would also be disabled by -O0
. I don't know how "bad" that would be.
Finally, we currently have a work-around in place for Clash in GHC 9, we use a data type wrapper around Integer
instead of a newtype wrapper:
data Signed (n :: Nat) = S { unsafeToInteger :: !Integer}
But that's far from ideal from a "running clash descriptions as regular haskell programs" point of view as it degrades performance by introducing an additional indirection.
The cast worker/wrapper transformation does not respect the -fno-worker-wrapper
flag.
I discovered this when 6d49d5be started replacing the NOINLINE
annotation on the wrapper with NOUSERINLINE
.
This is a problem for the Clash Haskell-to-Hardware compiler because we mark/it recognizes certain functions as "primitives", and those functions should not be inlined by GHC.
Compiling the following with ghc -O0 -fno-worker-wrapper Test9.hs
{-# LANGUAGE TypeOperators, DataKinds, KindSignatures #-}
{-# OPTIONS_GHC -O0 -fforce-recomp -fno-worker-wrapper -dverbose-core2core -ddump-ds -ddump-simpl -ddump-to-file #-}
module Test9 where
import GHC.TypeLits
newtype Signed (n :: Nat) = S Integer
{-# NOINLINE times #-}
times :: Signed m -> Signed n -> Signed (m + n)
times (S a) (S b) = S (a * b)
shows this desugar output:
-- RHS size: {terms: 8, types: 7, coercions: 9, joins: 0/0}
times [InlPrag=NOINLINE]
:: forall (m :: Nat) (n :: Nat).
Signed m -> Signed n -> Signed (m + n)
[LclIdX,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [0 0] 40 0}]
times
= \ (@(m_aBd :: Nat))
(@(n_aBe :: Nat))
(ds_dBO :: Signed m_aBd)
(ds_dBP :: Signed n_aBe) ->
(* @Integer
GHC.Num.$fNumInteger
(ds_dBO
`cast` (Test9.N:Signed[0] <m_aBd>_P :: Signed m_aBd ~R# Integer))
(ds_dBP
`cast` (Test9.N:Signed[0] <n_aBe>_P :: Signed n_aBe ~R# Integer)))
`cast` (Sym (Test9.N:Signed[0] <m_aBd + n_aBe>_P)
:: Integer ~R# Signed (m_aBd + n_aBe))
but shows this for the simplifier output:
-- RHS size: {terms: 8, types: 7, coercions: 4, joins: 0/0}
times1_rBI
:: forall {m :: Nat} {n :: Nat}. Signed m -> Signed n -> Integer
[GblId, Arity=2, Unf=OtherCon []]
times1_rBI
= \ (@(m_aBe :: Nat))
(@(n_aBf :: Nat))
(ds_dBP :: Signed m_aBe)
(ds1_dBQ :: Signed n_aBf) ->
* @Integer
GHC.Num.$fNumInteger
(ds_dBP
`cast` (Test9.N:Signed[0] <m_aBe>_P :: Signed m_aBe ~R# Integer))
(ds1_dBQ
`cast` (Test9.N:Signed[0] <n_aBf>_P :: Signed n_aBf ~R# Integer))
-- RHS size: {terms: 1, types: 0, coercions: 17, joins: 0/0}
times [InlPrag=NOUSERINLINE[final]]
:: forall (m :: Nat) (n :: Nat).
Signed m -> Signed n -> Signed (m + n)
[GblId, Arity=2, Unf=OtherCon []]
times
= times1_rBI
`cast` (forall (m :: <Nat>_N) (n :: <Nat>_N).
<Signed m>_R
%<'Many>_N ->_R <Signed n>_R
%<'Many>_N ->_R Sym (Test9.N:Signed[0] <m + n>_P)
:: (forall {m :: Nat} {n :: Nat}. Signed m -> Signed n -> Integer)
~R# (forall {m :: Nat} {n :: Nat}.
Signed m -> Signed n -> Signed (m + n)))
I am expecting the cast W/W transformation not to occur and keep a single binding called times
having a NOINLINE
pragma.
Basically, I expected an additional condition in https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Core/Opt/Simplify.hs#L525-534 that checks whether W/W is disabled through -fno-worker-wrapper
.
Optional:
Linux 4.19.84-microsoft-standard #1 SMP Wed Nov 13 11:44:37 UTC 2019 x86_64 GNU/Linux
x86_64
Thanks!
Would it be possible to backport/merge this commit into ghc-9.0 in time for the final ghc-9.0.1 release? With the ghc-bignum
change, some functions on Integer and Natural are no longer marked a NOINLINE meaning Clash can no longer constant-fold those functions. i.e. even with !4788 (closed) we still have that the Num instance for Natural will uses naturalSubThrow
for subtraction, a function without a NOINLINE. naturalSubThrow
subsequently uses bigNatSub
, which ultimately uses bignat_sub
which then does an FFI call (on the default GMP backend).
One option would be to add more NOINLINE to ghc-bignum
, but I wouldn't want to impose any benchmarking to figure out the performance applications of that. However, packporting/merging this MR into the ghc-9.0 branch in time form the ghc-9.0.1 release seems fully benign. This would allow Clash to at least recognize the FFI call and emulate its behavior (Clash already emulates some I/O calls in its constant folding, mostly w.r.t. mutating mutable byte arrays used by integer-gmp!)
GHCi, version 9.0.0.20201227: https://www.haskell.org/ghc/ :? for help
ghci> import Data.Bits
ghci> import GHC.Natural
ghci> shiftL 0 65 == (0 :: Natural)
False
0 should be equal to 0, no matter how many bits you shift it to the left.
Optional:
This change broke my workflow, as in, pre ghc-bignum shiftL 0 (-1) :: Integer
wouldn't throw an exception and simply return 0
. I realize I was depending on implementation-specific undefined behavior, but a mention of this change in the release notes / migration guide for ghc-9.0.1 would be nice.
Running with:
ghc -O -fno-full-laziness -ddump-simpl -dverbose-core2core -dcore-lint -ddump-simpl-iterations -fforce-recomp Test.hs
I see in the the following output of the Simplifier iteration=2
before the Specialisation pass:
-- RHS size: {terms: 5, types: 6, coercions: 4, joins: 0/0}
topEntity :: Bool -> Bool
[LclIdX,
Arity=1,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [0] 30 0}]
topEntity
= \ (ena_a1jA :: Bool) ->
system
(ena_a1jA
`cast` (Sym (GHC.Classes.N:IP[0] <"enable">_N <Bool>_N)
:: Bool ~R# (?enable::Bool)),
GHC.Classes.$fEqBool)
Then in the Simplifier iteration=1
after the Specialisation pass I see:
-- RHS size: {terms: 2, types: 1, coercions: 0, joins: 0/0}
topEntity :: Bool -> Bool
[LclIdX,
Arity=1,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=True)}]
topEntity = \ (ena_a1jA :: Bool) -> $ssystem_s1rr
Which is where it goes wrong, since topEntity
no longer uses its ena
argument.
And then finally in Simplifier iteration=2
after the Specialisation pass I see the reported:
-- RHS size: {terms: 2, types: 1, coercions: 0, joins: 0/0}
topEntity :: Bool -> Bool
[LclIdX,
Arity=1,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=True)}]
topEntity = \ _ [Occ=Dead] -> someFunction
Also just ran into this[1], with a situation similar to the one @mgsloan was describing. Reported it as #17398 (closed) with another fairly minimal example.
[1] Specifically this commit: https://github.com/clash-lang/clash-compiler/commit/48ae1a53ba8466420d63a7ccf33454113b163d7a; and the corresponding failure: https://gitlab.com/clash-lang/clash-compiler/-/jobs/331639322