JS Backend split JExpr and friends

What

This is a tracking issue to track progress on improving the data types in the JS backend, and thus improving the code quality.

Why, or didn't you just merge a JS Backend!?!?!

We want this for the JS backend. At time of writing the central data types for the JS backend is JExpr and JStmt, however these have numerous problems:

They serve two purposes: They are simultaneously used to write JS in Haskell (i.e., as an eDSL) and they are used as the last in-compiler representation before code generation.
They are untyped. This wouldn't be a problem if we were only consuming stg that GHC produces. But we are also writing an RTS and garbage collector in these data untyped datatypes, which makes it very easy to create subtle typed bugs that eventually make it into the JS payload.
Optimization passes are difficult because the data type is not a good normal form. This means that a lot of code is more expensive then necessary because we must constantly traverse the data types to get needed information for an optimization.
They exhibit the classic observable sharing problem with deeply embedded DSLs. In not so many words: these data types construct trees, but a node in the tree can have a back-edge (an additional edge to a parent or ancestor) because we allow for references (JVar). This means that JExpr and JStmt can be circular! The way we deal with this circularity is by UnsatSupply and pseudoSaturate. But this adds a constructor to each data type in the JS backend and runs a state monad numerous times.
(last one I swear). These datatypes confuse reference equality with value equality. Consider the case where I have two JS objects (JHash mempty :: JVal). JVal derives Eq meaning that in Haskell (==) would return True because these types are equivalent values (well, structurally equivalent to be specific). But this is wrong! In the semantic domain these two objects are not equal because they have different memory addresses:

Welcome to Node.js v18.12.1.
Type ".help" for more information.
> var x = {};
undefined
> var y = {};
undefined
> x === y
false

Furthermore this breaks referential transparency: same input (in this case ()), different output because different objects. So we need an eDSL that captures these issues and cleanly separates the messy bits from the bits we care about and want to write with.

So what to do

Make a new eDSL based on sunroof, and paper with proper attribution to the authors of course, and adapt that for our needs and to GHC.

Why sunroof? It deals nicely with the aforementioned problems and is a nicely organized and clean DSL. Basically it suits our specific needs quite nicely, even if we don't adopt all of its features (such as threading).

Ok but how?

Here is the big picture and plan for this project:

The goal is to do a piecemeal rewrite and avoid a massive rewrite with a huge MR.
To fix (1) above we split JExpr and friends in two:
1. One is a typed eDSL based an sunroof. This is what we'll use to write JS programs in HS, such as the JS backend RTS and maybe some shims.
2. The other is an IR that sits right before code generation. This is what we'll target for JS specific optimization passes.
Next we implement the eDSL and an interpreter for the eDSL with the type interp :: eDSL -> IR
Now we can work in parallel:
1. We treat JExpr and friends as the IR before code gen for now. This means that interp has type interp :: eDSL -> JExpr
2. Then we begin piecemeal rewrites of the code base and wrap the rewrites with a call to interp before code generation. Thus we do not touch the code generator and can continue with bug fixes and whatever other work while this work is on going.
3. Once most of the backend has been rewritten in the eDSL we start to phase out JExpr and friends, we do this by:
  1. Implement the new IR right before code gen.
  2. Write the IR code Generator
  3. Write the eDSL -> IR interpreter
  4. Remove JExpr and friends. The goal here is that this should be small enough for a single MR.

Note that we could implement the new IR earlier and write an interpreter from JExpr and friends to it and swap out the code generator faster. This would allow us to get to work on the optimization passes on the new IR faster. But I think it is best to move in a more careful and controlled manner as to avoid introducing new bugs.

The Data Flow of the JS backend

Current version

JS Backend -----\
                 |
                 V
STG --------> JExpr ---> Code Gen

Version while project is on-going

JS Backend --> eDSL interp --> JExpr
                                 |
                                 V   
STG -------------------------> JExpr --> Code Gen

Final Version

JS Backend --> eDSL interp --> IR  
                                |  
                                V     
STG -------------------------> IR --> Optimizations --> Code Gen

Working Tasks

Split JMacro into (JS eDSL + JS syntax): !10142 (closed)
Introduce simple JS optimizer: !10260 (closed)
Split JS eDSL into (eDSL + JStg): !10722 (closed)
Enhance eDSL (type-safety): !10000
Add JStg optimizer: !11507 (closed)
Add JSLinkable IR and pre-render phase: TODO
Enhance JS optimizer: TODO

Edited Feb 22, 2024 by Andrei Borzenkov

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information