What is the RuntimeRep requirements for novel Backends?
What
This issue was requested in the conversation of !7577 and serves as a summary of that thread. It should be the single place to discuss these issues. Please make no further comment on !7577 and feel free to edit/rename if I've missed something important.
Progenitor issue: #21078 (closed)
Status
** ON HOLD **
Please see !7577 status
The problem
The essential problem is Does the current design of RuntimeRep
satisfy the needs of novel backends?. The discussion becomes tricky because answering that question implies more questions:
- How is equality on
RuntimeRep
defined? The definition of a givenRuntimeRep
is defined by the platform, but this means that for new backends we are trying to reason about platforms we do not yet support or know about. - Given (1) How are we to make
RuntimeRep
platform dependent and extensible for future backends? See the suggestions section below.
Background
This problem arose in work on the new javascript backend (see #21078 (closed)).
- We (the IOG team) believed we needed a new prim type, I (Jeff) called
Opaque#
, that served as a handle to arbitrary platform-specific data (javascript values). Crucially,Opaque#
s are not necessarily pointer-sized. This feature is on the critical path for GHCJS because it allows us to marshal types to and from javascript. In the previousghcjs
,Opaque#
is calledJSVal
. Thus with this prim type we should be able to basically copy marshalling code from the oldghcjs
:
-- data JSVal = JSVal ByteArray# -- for reference, old implementation in old ghcjs
newtype JSVal = JSVal Opaque# -- new implementation with prim type
-- Pure marshalling to a javascript value
class PToJSVal a where
pToJSVal :: a -> JSVal
-- Pure mashalling from javascript to a haskell value
class PFromJSVal a where
pFromJSVal :: JSVal -> a
- So I (Jeff) implemented this new prim type. In the process of doing so it occurred to me that we needed a new
RuntimeRep
to support the type. But none of the currentRuntimeRep
cases support our use case:
data RuntimeRep = VecRep VecCount VecElem -- ^ a SIMD vector type
| TupleRep [RuntimeRep] -- ^ An unboxed tuple of the given reps
| SumRep [RuntimeRep] -- ^ An unboxed sum of the given reps
| BoxedRep Levity -- ^ boxed; represented by a pointer
| IntRep -- ^ signed, word-sized value
| Int8Rep -- ^ signed, 8-bit value
| Int16Rep -- ^ signed, 16-bit value
| Int32Rep -- ^ signed, 32-bit value
| Int64Rep -- ^ signed, 64-bit value
| WordRep -- ^ unsigned, word-sized value
| Word8Rep -- ^ unsigned, 8-bit value
| Word16Rep -- ^ unsigned, 16-bit value
| Word32Rep -- ^ unsigned, 32-bit value
| Word64Rep -- ^ unsigned, 64-bit value
| AddrRep -- ^ A pointer, but /not/ to a Haskell value
| FloatRep -- ^ a 32-bit floating point number
| DoubleRep -- ^ a 64-bit floating point number
Ideally we would use AddrRep
, but because Opaque#
has no guarantee to be pointer sized (a fact of javascript as a target platform) we cannot use it.
Goals
- A
RuntimeRep
that works forGHCJS
and forAsterius
, i.e., for javascript backend and web assembly backend.
Suggestions
Add a new runtime rep for each backend
data RuntimeRep = VecRep VecCount VecElem -- ^ a SIMD vector type
| TupleRep [RuntimeRep] -- ^ An unboxed tuple of the given reps
| SumRep [RuntimeRep] -- ^ An unboxed sum of the given reps
| BoxedRep Levity -- ^ boxed; represented by a pointer
| ...
| JSValRep -- ^ Javascript values and objects
| JVMValRep -- ^ JVM values and objects
| FooValRep -- ^ Foo values and objects
Use CPP
data RuntimeRep = VecRep VecCount VecElem -- ^ a SIMD vector type
| TupleRep [RuntimeRep] -- ^ An unboxed tuple of the given reps
| SumRep [RuntimeRep] -- ^ An unboxed sum of the given reps
| BoxedRep Levity -- ^ boxed; represented by a pointer
| ...
| JSValRep -- ^ Javascript values and objects
| JVMValRep -- ^ JVM values and objects
| FooValRep -- ^ Foo values and objects
#ifdef javascript_HOST_ARCH
| JSRef
#elif jvm_HOST_ARCH
| JVMRef
#elif beam_HOST_ARCH
...
#elif foo_HOST_ARCH
...
Use Pattern synonyms
@Ericson2314 writes:
The ability to do
pattern Opaque = #if defined javascript_HOST_ARCH JsRef #elif defined wasm_HOST_ARCH WasmRef #elif defined jvm_HOST_ARCH JvmRef #endif
does make me think @monoidal is right that erring on the side of more separate things is fine. We can always unify them later, but splitting apart is not so easy!
Points in the previous discourse
In this section I'll try to run down the major points in the thread on !7577 to consolidate the conversation.
What is wrong with old implementation?
Summary
- It is a hack using
ByteArray#
- It is unclear to me (Jeff) why exactly this is problematic. What issues does it cause exactly? Is there something we want to do but can't due to this implementation? Is it a slow implementation? Or just conceptually wrong?
Discourse
JSVal
in GHCJS is currently represented asdata JSVal = JSVal ByteArray#
, where an arbitrary JavaScript value is stored at the position of theByteArray#
field.
But this is a bit of a hack. At the moment we have these primitive types with their JS representation:
- Word#/Word32#: JS number
- Int#/Int32#: JS number
- ByteArray#: JS object that wraps a typed array
- Addr#: A pair of a JS number (offset) and a typed array object
None of these types exactly matches the "any JavaScript value" that we proposed the
Opaque#
type for.
In response @sgraf812 suggests less invasive changes than a prim type:
Why can't we have
type Opaque# :: TYPE JSRep
type ByteArray# :: TYPE JSRep
ornewtype ByteArray# a = ByteArray# Opaque#
Would that work? ... Thinking about it, I naively claim
To the untyped JS backend, every
RuntimeRep
except the specialAddrRep
andBoxedRep
could be treated the same.(
AddrRep
needs pointer arithmetic,BoxedRep
needs support from the RTS, I suppose. Hence they are excluded.)That is, we could define the axioms
type Word :: TYPE DoubleRep
ortype Double :: TYPE IntRep
and still manage to compile valid JavaScript. Is that right? Why isn't it?
Questions
- Are
JSVal
orOpaque
types are GC'd by the javascript runtime? (I assume yes) - What is wrong with the
ByteArray#
implementation exactly? - @sgraf812 questions from above; generally, is a new prim type actually required? More specifically:
- Could we get away with changing the representation of a type rather than adding a prim type
- Could we use
type Word :: TYPE DoubleRep
and still compile valid js since js is essentially untyped anyway?
Needs on the wasm side
Summary:
- wasm does not need
Opaque#
as I have defined it. -
JSVal#
in the wasm backend live on the Haskell heap and thusBoxedRep Unlifted
can be used - But this means the wasm backend needs special logic in GHC's GC.
To provide a bit more context from the wasm side: when we add JavaScript interop for wasm support, our
JSVal#
prim type is expected to be UnliftedRep, with a word-sized payload to represent a table index. TheJSVal#
closures are managed by the C garbage collector. We do need additional hooks in GC, so the liveJSVal#
s on the Haskell heap can be collected and reported to JS periodically though. So it seems the Opaque type design here cannot be used per-se for wasm.
Specifically because in wasm, JSVal#
s exist on the Haskell heap and thus BoxedRep Unlifted
works.
No,
AddrRep
has no special meaning in wasm. It's expected to be C memory address.
UnliftedRep
is reallyBoxedRep Unlifted
, which is the representation of an unlifted, boxed pointer to the Haskell heap, managed by GHC's GC.BoxedRep Unlifted
points to the Haskell heap andAddrRep
points to something in C land (or WebAsm/JS land) that the Haskell GC shouldn't need to follow.Yes, and that's exactly what I want. All
JSVal#
s exist on the Haskell heap; they do need special handling to cooperate with JS though, and whatever special handling we add is supposed to be no-op on other native platforms.
Questions
- What logic is required in the GC to collect and communicate collection of wasm
JSVal
s? - What exactly is the definition of
JSVal
in wasm?newtype JSVal# = JSVal# Addr#
with repBoxedRep Unlifted
?
Other major points
-
@bgamari notes that
AddrRep
is always ignored by GHC's GC. - That fact about
AddrRep
leads us back full circle to the platform dependency ofRuntimeRep
, as noted by @TerrorJack:
I agree it's a very important property.
JSRep
,JVMRep
,CLRRep
or whatever foreign runtime rep may all have differences re how they interact with GC.