JS Backend split JExpr and friends
What
This is a tracking issue to track progress on improving the data types in the JS backend, and thus improving the code quality.
Why, or didn't you just merge a JS Backend!?!?!
We want this for the JS backend. At time of writing the central data types for the JS backend is JExpr
and JStmt
, however these have numerous problems:
- They serve two purposes: They are simultaneously used to write JS in Haskell (i.e., as an eDSL) and they are used as the last in-compiler representation before code generation.
- They are untyped. This wouldn't be a problem if we were only consuming stg that GHC produces. But we are also writing an RTS and garbage collector in these data untyped datatypes, which makes it very easy to create subtle typed bugs that eventually make it into the JS payload.
- Optimization passes are difficult because the data type is not a good normal form. This means that a lot of code is more expensive then necessary because we must constantly traverse the data types to get needed information for an optimization.
- They exhibit the classic observable sharing problem with deeply embedded DSLs. In not so many words: these data types construct trees, but a node in the tree can have a back-edge (an additional edge to a parent or ancestor) because we allow for references (
JVar
). This means thatJExpr
andJStmt
can be circular! The way we deal with this circularity is byUnsatSupply
andpseudoSaturate
. But this adds a constructor to each data type in the JS backend and runs a state monad numerous times. - (last one I swear). These datatypes confuse reference equality with value equality. Consider the case where I have two JS objects (
JHash mempty :: JVal
).JVal
derivesEq
meaning that in Haskell(==)
would returnTrue
because these types are equivalent values (well, structurally equivalent to be specific). But this is wrong! In the semantic domain these two objects are not equal because they have different memory addresses:
Welcome to Node.js v18.12.1.
Type ".help" for more information.
> var x = {};
undefined
> var y = {};
undefined
> x === y
false
Furthermore this breaks referential transparency: same input (in this case ()
), different output because different objects. So we need an eDSL that captures these issues and cleanly separates the messy bits from the bits we care about and want to write with.
So what to do
Make a new eDSL based on sunroof, and paper with proper attribution to the authors of course, and adapt that for our needs and to GHC.
Why sunroof? It deals nicely with the aforementioned problems and is a nicely organized and clean DSL. Basically it suits our specific needs quite nicely, even if we don't adopt all of its features (such as threading).
Ok but how?
Here is the big picture and plan for this project:
- The goal is to do a piecemeal rewrite and avoid a massive rewrite with a huge MR.
- To fix (1) above we split
JExpr
and friends in two:- One is a typed eDSL based an sunroof. This is what we'll use to write JS programs in HS, such as the JS backend RTS and maybe some shims.
- The other is an IR that sits right before code generation. This is what we'll target for JS specific optimization passes.
- Next we implement the eDSL and an interpreter for the eDSL with the type
interp :: eDSL -> IR
- Now we can work in parallel:
- We treat
JExpr
and friends as the IR before code gen for now. This means thatinterp
has typeinterp :: eDSL -> JExpr
- Then we begin piecemeal rewrites of the code base and wrap the rewrites with a call to
interp
before code generation. Thus we do not touch the code generator and can continue with bug fixes and whatever other work while this work is on going. - Once most of the backend has been rewritten in the eDSL we start to phase out
JExpr
and friends, we do this by:- Implement the new IR right before code gen.
- Write the IR code Generator
- Write the eDSL -> IR interpreter
- Remove
JExpr
and friends. The goal here is that this should be small enough for a single MR.
- We treat
Note that we could implement the new IR earlier and write an interpreter from JExpr
and friends to it and swap out the code generator faster. This would allow us to get to work on the optimization passes on the new IR faster. But I think it is best to move in a more careful and controlled manner as to avoid introducing new bugs.
The Data Flow of the JS backend
Current version
JS Backend -----\
|
V
STG --------> JExpr ---> Code Gen
Version while project is on-going
JS Backend --> eDSL interp --> JExpr
|
V
STG -------------------------> JExpr --> Code Gen
Final Version
JS Backend --> eDSL interp --> IR
|
V
STG -------------------------> IR --> Optimizations --> Code Gen
Working Tasks
-
Split JMacro into (JS eDSL + JS syntax): !10142 (closed) -
Introduce simple JS optimizer: !10260 (closed) -
Split JS eDSL into (eDSL + JStg): !10722 (closed) -
Enhance eDSL (type-safety): !10000 -
Add JStg optimizer: !11507 (closed) -
Add JSLinkable IR and pre-render phase: TODO -
Enhance JS optimizer: TODO