Commit 4da9a946 authored by areid's avatar areid
Browse files

[project @ 1997-12-23 17:52:39 by areid]

parent 7a3b6ac1
......@@ -44,6 +44,7 @@
\begin{document}
\newcommand{\ToDo}[1]{{{\bf ToDo:}\sl #1}}
\newcommand{\Note}[1]{{{\bf Note:}\sl #1}}
\newcommand{\Arg}[1]{\mbox{${\tt arg}_{#1}$}}
\newcommand{\bottom}{bottom} % foo, can't remember the symbol name
......@@ -85,8 +86,8 @@ size of update frames, and eliminates
\item The ``return in registers'' return convention has been dropped
because it was complicated and doesn't work well on register-poor
architectures. It has been partly replaced by unboxed
tuples~\ref{sect:unboxed-tuples} which allow the programmer to
architectures. It has been partly replaced by unboxed tuples
(section~\ref{sect:unboxed-tuples}) which allow the programmer to
explicitly state where results should be returned in registers (or on
the stack) instead of on the heap.
......@@ -151,7 +152,7 @@ only anticipate one, however.
\subsection{Glossary}
\ToDo{This terminology is not used consistently within the document.
If you find soemthing which disagrees with this terminology, fix the
If you find something which disagrees with this terminology, fix the
usage.}
\begin{itemize}
......@@ -196,10 +197,10 @@ words and pointers are the same size.
% More terminology to mention.
% unboxed, unpointed
\subsection{Invariants}
\subsection{Subtle Dependencies}
There are a few system invariants which need to be mentioned ---
though this is probably the wrong place for them.
Some decisions have very subtle consequences which should be written
down in case we want to change our minds.
\begin{itemize}
......@@ -208,12 +209,71 @@ it to the old generation. This is important because the GC avoids
performing heap overflow checks by assuming that the amount added to
the old generation is no bigger than the current new generation.
\item
If the garbage collector is allowed to shrink the stack of a thread,
we cannot omit the stack check in return continuations
(section~\ref{sect:heap-and-stack-checks}).
\item
When we return to the scheduler, the top object on the stack is a closure.
The scheduler restarts the thread by entering the closure.
Section~\ref{sect:hugs-return-convention} discusses how Hugs returns an
unboxed value to GHC and how GHC returns an unboxed value to Hugs.
\item
When we return to the scheduler, we need a few empty words on the stack
to store a closure to reenter. Section~\ref{sect:heap-and-stack-checks}
discusses who does the stack check and how much space they need.
\item
Heap objects never contain slop --- this is required if we want to
support mostly-copying garbage collection.
This is hard to arrange if we do \emph{lazy} blackholing
(section~\ref{sect:lazy-black-holing}) so we currently plan to
blackhole an object when we push the update frame.
\item
Info tables for constructors contain enough information to decide which
return convention they use. This allows Hugs to use a single piece of
entry code for all constructors and insulates Hugs from changes in the
choice of return convention.
\end{itemize}
\section{Source Language}
\subsection{Explicit Allocation}\label{sect:explicit-allocation}
As in the original STG machine, (almost) all heap allocation is caused
by executing a let(rec). Since we no longer support the return in
registers convention for data constructors, constructors now cause heap
allocation and so they should be let-bound.
For example, we now write
@
> cons = \ x xs -> let r = (:) x xs in r
@
instead of
@
> cons = \ x xs -> (:) x xs
@
\subsection{Unboxed tuples}\label{sect:unboxed-tuples}
\Note{We're not planning to implement this right away. There doesn't
seem to be any real difficulty adding it to the runtime system but
it'll take a lot of work adding it to the compiler. Since unboxed
tuples do not trigger allocation, the syntax could be modified to allow
unboxed tuples in expressions.}
Functions can take multiple arguments as easily as they can take one
argument: there's no cost for adding another argument. But functions
can only return one result: the cost of adding a second ``result'' is
......@@ -222,7 +282,7 @@ The assymetry is rather galling and can make certain programming
styles quite expensive. For example, consider a simple state transformer
monad:
@
> type S a = State -> (a,State)
> type S a = State -> (a,State)
> bindS m k s0 = case m s0 of { (a,s1) -> k a s1 }
> returnS a s = (a,s)
> getS s = (s,s)
......@@ -259,9 +319,334 @@ thunks never have unboxed types --- so we'll never try to update an
unboxed constructor. The restriction to a single constructor is
largely to avoid garbage collection complications.
\subsection{STG Syntax}
\ToDo{Insert STG syntax with appropriate changes.}
%-----------------------------------------------------------------------------
\part{Evaluation Model}
\section{Overview}
This part is concerned with defining the external interfaces of the
major components of the system; the next part is concerned with their
inner workings.
The major components of the system are:
\begin{itemize}
\item The scheduler
\item The loader
\item The storage manager
\item The machine code evaluator (compiled code)
\item The bytecode evaluator (interpreted code)
\item The compilers
\end{itemize}
\section{The Compilers}
Need to describe interface files.
Here's an example - but I don't know the grammar - ADR.
@
_interface_ Main 1
_exports_
Main main ;
_declarations_
1 main _:_ IOBase.IO PrelBase.();;
@
\section{The Scheduler}
The Scheduler is the heart of the run-time system. A running program
consists of a single running thread, and a list of runnable and
blocked threads. The running thread returns to the scheduler when any
of the following conditions arises:
\begin{itemize}
\item A heap check fails, and a garbage collection is required
\item Compiled code needs to switch to interpreted code, and vice
versa.
\item The thread becomes blocked.
\item The thread is preempted.
\end{itemize}
A running system has a global state, consisting of
\begin{itemize}
\item @Hp@, the current heap pointer, which points to the next
available address in the Heap.
\item @HpLim@, the heap limit pointer, which points to the end of the
heap.
\item The Thread Preemption Flag, which is set whenever the currently
running thread should be preempted at the next opportunity.
\item A list of runnable threads.
\item A list of blocked threads.
\end{itemize}
Each thread is represented by a Thread State Object (TSO), which is
described in detail in Section \ref{sect:TSO}.
The following is pseudo-code for the inner loop of the scheduler
itself.
@
while (threads_exist) {
// handle global problems: GC, parallelism, etc
if (need_gc) gc();
if (external_message) service_message();
// deal with other urgent stuff
pick a runnable thread;
do {
// enter object on top of stack
// if the top object is a BCO, we must enter it
// otherwise appply any heuristic we wish.
if (thread->stack[thread->sp]->info.type == BCO) {
status = runHugs(thread,&smInfo);
} else {
status = runGHC(thread,&smInfo);
}
switch (status) { // handle local problems
case (StackOverflow): enlargeStack; break;
case (Error e) : error(thread,e); break;
case (ExitWith e) : exit(e); break;
case (Yield) : break;
}
} while (thread_runnable);
}
@
\subsection{Invoking the garbage collector}
\subsection{Putting the thread to sleep}
\subsection{Calling C from Haskell}
We distinguish between "safe calls" where the programmer guarantees
that the C function will not call a Haskell function or, in a
multithreaded system, block for a long period of time and "unsafe
calls" where the programmer cannot make that guarantee.
Safe calls are performed without returning to the scheduler and are
discussed elsewhere (\ToDo{discuss elsewhere}).
Unsafe calls are performed by returning an array (outside the Haskell
heap) of arguments and a C function pointer to the scheduler. The
scheduler allocates a new thread from the operating system
(multithreaded system only), spawns a call to the function and
continues executing another thread. When the ccall completes, the
thread informs the scheduler and the scheduler adds the thread to the
runnable threads list.
\ToDo{Describe this in more detail.}
\subsection{Calling Haskell from C}
When C calls a Haskell closure, it sends a message to the scheduler
thread. On receiving the message, the scheduler creates a new Haskell
thread, pushes the arguments to the C function onto the thread's stack
(with tags for unboxed arguments) pushes the Haskell closure and adds
the thread to the runnable list so that it can be entered in the
normal way.
When the closure returns, the scheduler sends back a message which
awakens the (C) thread.
\ToDo{Do we need to worry about the garbage collector deallocating the
thread if it gets blocked?}
\subsection{Switching Worlds}
\label{sect:switching-worlds}
\ToDo{This has all changed: we always leave a closure on top of the
stack if we mean to continue executing it. The scheduler examines the
top of the stack and tries to guess which world we want to be in. If
it finds a @BCO@, it certainly enters Hugs, if it finds a @GHC@
closure, it certainly enters GHC and if it finds a standard closure,
it is free to choose either one but it's probably best to enter GHC
for everything except @BCO@s and perhaps @AP@s.}
Because this is a combined compiled/interpreted system, the
interpreter will sometimes encounter compiled code, and vice-versa.
All world-switches go via the scheduler, ensuring that the world is in
a known state ready to enter either compiled code or the interpreter.
When a thread is run from the scheduler, the @whatNext@ field in the
TSO (Section \ref{sect:TSO}) is checked to find out how to execute the
thread.
\begin{itemize}
\item If @whatNext@ is set to @ReturnGHC@, we load up the required
registers from the TSO and jump to the address at the top of the user
stack.
\item If @whatNext@ is set to @EnterGHC@, we load up the required
registers from the TSO and enter the closure pointed to by the top
word of the stack.
\item If @whatNext@ is set to @EnterHugs@, we enter the top thing on
the stack, using the interpreter.
\end{itemize}
There are four cases we need to consider:
\begin{enumerate}
\item A GHC thread enters a Hugs-built closure.
\item A GHC thread returns to a Hugs-compiled return address.
\item A Hugs thread enters a GHC-built closure.
\item A Hugs thread returns to a Hugs-compiled return address.
\end{enumerate}
GHC-compiled modules cannot call functions in a Hugs-compiled module
directly, because the compiler has no information about arities in the
external module. Therefore it must assume any top-level objects are
CAFs, and enter their closures.
\ToDo{Hugs-built constructors?}
We now examine the various cases one by one and describe how the
switch happens in each situation.
\subsection{A GHC thread enters a Hugs-built closure}
\label{sect:ghc-to-hugs-closure}
There is three possibilities: GHC has entered a @PAP@, or it has
entered a @AP@, or it has entered the BCO directly (for a top-level
function closure). @AP@s and @PAP@s are ``standard closures'' and
so do not require us to enter the bytecode interpreter.
The entry code for a BCO does the following:
\begin{itemize}
\item Push the address of the object entered on the stack.
\item Save the current state of the thread in its TSO.
\item Return to the scheduler, setting @whatNext@ to @EnterHugs@.
\end{itemize}
BCO's for thunks and functions have the same entry conventions as
slow entry points: they expect to find their arguments on the stac
with unboxed arguments preceded by appropriate tags.
\subsection{A GHC thread returns to a Hugs-compiled return address}
\label{sect:ghc-to-hugs-return}
Hugs return addresses are laid out as in Figure
\ref{fig:hugs-return-stack}. If GHC is returning, it will return to
the address at the top of the stack, namely @HUGS_RET@. The code at
@HUGS_RET@ performs the following:
\begin{itemize}
\item pushes \Arg{1} (the return value) on the stack.
\item saves the thread state in the TSO
\item returns to the scheduler with @whatNext@ set to @EnterHugs@.
\end{itemize}
\noindent When Hugs runs, it will enter the return value, which will
return using the correct Hugs convention (Section
\ref{sect:hugs-return-convention}) to the return address underneath it
on the stack.
\subsection{A Hugs thread enters a GHC-compiled closure}
\label{sect:hugs-to-ghc-closure}
Hugs can recognise a GHC-built closure as not being one of the
following types of object:
\begin{itemize}
\item A @BCO@,
\item A @AP@,
\item A @PAP@,
\item An indirection, or
\item A constructor.
\end{itemize}
When Hugs is called on to enter a GHC closure, it executes the
following sequence of instructions:
\begin{itemize}
\item Push the address of the closure on the stack.
\item Save the current state of the thread in the TSO.
\item Return to the scheduler, with the @whatNext@ field set to
@EnterGHC@.
\end{itemize}
\subsection{A Hugs thread returns to a GHC-compiled return address}
\label{sect:hugs-to-ghc-return}
When Hugs encounters a return address on the stack that is not
@HUGS_RET@, it knows that a world-switch is required. At this point
the stack contains a pointer to the return value, followed by the GHC
return address. The following sequence is then performed:
\begin{itemize}
\item save the state of the thread in the TSO.
\item return to the scheduler, setting @whatNext@ to @EnterGHC@.
\end{itemize}
The first thing that GHC will do is enter the object on the top of the
stack, which is a pointer to the return value. This value will then
return itself to the return address using the GHC return convention.
\section{The Loader}
\ToDo{Is it ok to load code when threads are running?}
In a batch mode system, we can statically link all the modules
together. In an interactive system we need a loader which will
explicitly load and unload individual modules (or, perhaps, blocks of
mutually dependent modules) and resolve references between modules.
While many operating systems provide support for dynamic loading and
will automatically resolve cross-module references for us, we generally
cannot rely on being able to load mutually dependent modules.
A portable solution is to perform some of the linking ourselves. Each module
should provide three global symbols:
\begin{itemize}
\item
An initialisation routine. (Might also be used for finalisation.)
\item
A table of symbols it exports.
Entries in this table consist of the symbol name and the address of the
names value.
\item
A table of symbols it imports.
Entries in this table consist of the symbol name and a list of references
to that symbol.
\end{itemize}
On loading a group of modules, the loader adds the contents of the
export lists to a symbol table and then fills in all the references in the
import lists.
References in import lists are of two types:
\begin{description}
\item[ References in machine code ]
The most efficient approach is to patch the machine code directly, but
this will be a lot of work, very painful to port and rather fragile.
Alternatively, the loader could store the value of each symbol in the
import table for each module and the compiled code can access all
external objects through the import table. This requires that the
import table be writable but does not require that the machine code or
info tables be writable.
\item[ References in data structures (SRTs and static data constructors) ]
Either we patch the SRTs and constructors directly or we somehow use
indirections through the symbol table. Patching the SRTs requires
that we make them writable and prevents us from making effective use
of virtual memories that use copy-on-write policies. Using an
indirection is possible but tricky.
Note: We could avoid patching machine code if all references to
eternal references went through the SRT --- then we just have one
thing to patch. But the SRT always contains a pointer to the closure
rather than the fast entry point (say), so we'd take a big performance
hit for doing this.
\end{description}
\section{Compiled Execution}
This section describes the framework in which compiled code evaluates
......@@ -301,6 +686,7 @@ registers --- depending on whether they all have the same kind or they
have different kinds.
\subsubsection{Entering closures}
\label{sect:entering-closures}
To evaluate a closure we jump to the entry code for the closure
passing a pointer to the closure in \Arg{1} so that the entry code can
......@@ -616,14 +1002,14 @@ vectored and direct-return datatypes. This is done by arranging that
the update code looks like this:
@
| ^ |
| return vector |
|---------------|
| fixed-size |
| info table |
|---------------| <- update code pointer
| update code |
| v |
| ^ |
| return vector |
|---------------|
| fixed-size |
| info table |
|---------------| <- update code pointer
| update code |
| v |
@
Each entry in the return vector (which is large enough to cover the
......@@ -668,7 +1054,7 @@ alternative. Here, for example, is pseudo-code for the expression
tag = \Arg{1}->entry->tag;
if (isWHNF(tag)) {
Sp--; \\ insert space for return address
goto ret;
goto ret;
}
push(ret);
goto \Arg{1}->entry;
......@@ -683,8 +1069,8 @@ and here is the code for the expression @(case x of { [] -> E1; x:xs -> E2 }@:
\Arg{1} = <pointer to x>;
tag = \Arg{1}->entry->tag;
if (isWHNF(tag)) {
Sp--; \\ insert space for return address
goto retvec[tag];
Sp--; \\ insert space for return address
goto retvec[tag];
}
push(retinfo);
goto \Arg{1}->entry;
......@@ -723,8 +1109,7 @@ entering the garbage collector.
\subsection{Heap and Stack Checks}
\note{I reckon these deserve a subsection of their own}
\label{sect:heap-and-stack-checks}
The storage manager detects that it needs to garbage collect the old
generation when the evaluator requests a garbage collection without
......@@ -733,8 +1118,32 @@ is therefore important that the GC routines {\em not} move the heap
pointer unless the heap check fails. This is different from what
happens in the current STG implementation.
Talk about how stack check looks ahead into the branches of case expressions.
Assuming that the stack can never shrink, we perform a stack check
when we enter a closure but not when we return to a return
continuation. This doesn't work for heap checks because we cannot
predict what will happen to the heap if we call a function.
If we wish to allow the stack to shrink, we need to perform a stack
check whenever we enter a return continuation. Most of these checks
could be eliminated if the storage manager guaranteed that a stack
would always have 1000 words (say) of space after it was shrunk. Then
we can omit stack checks for less than 1000 words in return
continuations.
When an argument satisfaction check fails, we need to push the closure
(in R1) onto the stack - so we need to perform a stack check. The
problem is that the argument satisfaction check occurs \emph{before}
the stack check. The solution is that the caller of a slow entry
point or closure will guarantee that there is at least one word free
on the stack for the callee to use.
Similarily, if a heap or stack check fails, we need to push the arguments
and closure onto the stack. If we just came from the slow entry point,
there's certainly enough space and it is the responsibility of anyone
using the fast entry point to guarantee that there is enough space.
\ToDo{Be more precise about how much space is required - document it
in the calling convention section.}
\subsection{Handling interrupts/signals}
......@@ -754,7 +1163,7 @@ Hugs interprets code by converting it to byte-code and applying a
byte-code interpreter to it. Wherever possible, we try to ensure that
the byte-code is all that is required to interpret a section of code.
This means not dynamically generating info tables, and hence we can
only have a small number of possible heap objects each with a staticly
only have a small number of possible heap objects each with a statically
compiled info table. Similarly for stack objects: in fact we only
have one Hugs stack object, in which all information is tagged for the
garbage collector.
......@@ -765,6 +1174,87 @@ alternative is to force a context-switch each time compiled code
enters a Hugs-built constructor, which would be prohibitively
expensive.
We achieve this simplicity by forgoing some of the optimisations used
by compiled code:
\begin{itemize}
\item
Whereas compiled code has five different ways of entering a closure
(section~\ref{sect:entering-closures}), interpreted code has only one.
The entry point for interpreted code behaves like slow entry points for
compiled code.
\item
We use just one info table for {\em all\/} direct returns.
This introduces two problems:
\begin{enumerate}
\item How does the interpreter know what code to execute?
Instead of pushing just a return address, we push a return BCO and a
trivial return address which just enters the return BCO.
(In a purely interpreted system, we could avoid pushing the trivial
return address.)
\item How can the garbage collector follow pointers within the
activation record?
We could push a third word ---a bitmask describing the location of any
pointers within the record--- but, since we're already tagging unboxed
function arguments on the stack, we use the same mechanism for unboxed
values within the activation record.
\ToDo{Do we have to stub out dead variables in the activation frame?}
\end{enumerate}
\item
We trivially support vectored returns by pushing a return vector whose
entries are all the same.
\item
We avoid the need to build SRTs by putting bytecode objects on the
heap and restricting BCOs to a single basic block.
\end{itemize}
\subsubsection{Hugs Info Tables}
Hugs requires the following info tables and closures:
\begin{description}
\item [@HUGS_RET@].
Contains both a vectored return table and a direct entry point. All
entry points are the same: they rearrange the stack to match the Hugs
return convention (section~{sect:hugs-return-convention}) and return
to the scheduler. When the scheduler restarts the thread, it will
find a BCO on top of the stack and will enter the Hugs interpreter.
\item [@UPD_RET@].
\item [Constructors].
The entry code for a constructor jumps to a generic entry point in the
runtime system which decides whether to do a vectored or unvectored
return depending on the shape of the constructor/type. This implies that
info tables must have enough info to make that decision.
\item [@AP@ and @PAP@].
\item [Indirections].
\item [Selectors].
-- doesn't generate them itself but it ought to recognise them
\item [Complex primops].
\end{description}
\subsection{Hugs Heap Objects}
\label{sect:hugs-heap-objects}
......@@ -776,10 +1266,10 @@ detail in Section \ref{sect:BCO}, in this section we will describe
their semantics.
Since byte-code lives on the heap, it can be garbage collected just
like any other heap-resident data. Hugs maintains a table of
currently live BCOs, which is treated as a table of live pointers by
the garbage collector. When a module is unloaded, the pointers to its
BCOs are removed from the table, and the code will be garbage
like any other heap-resident data. Hugs arranges that any BCO's
referred to by the Hugs symbol tables are treated as live objects by
the garbage collectr. When a module is unloaded, the pointers to its
BCOs are removed from the symbol table, and the code will be garbage
collected some time later.
A BCO represents a basic block of code - all entry points are at the
......@@ -789,6 +1279,9 @@ closure; a BCO can be entered just like any other closure. Hugs
performs lambda-lifting during compilation to byte-code, and each
top-level combinator becomes a BCO in the heap.
\ToDo{The phrase "all entry points..." suggests that BCOs have multiple
entry points. If so, we need to say a lot more about it...}