Commit ff14742c authored by areid's avatar areid
Browse files

[project @ 1998-01-08 14:40:22 by areid]

Added an overview section, commented out some of the more bogus parts of the document (but some bits still remain)
parent b3163aef
......@@ -5,13 +5,12 @@
% TODO:
%
% o I think it would be worth making the connection with CPS explicit.
% o I (ADR) think it would be worth making the connection with CPS explicit.
% Now that we have explicit activation records (on the stack), we can
% explain the whole system in terms of CPS and tail calls --- with the
% one requirement that we carefuly distinguish stack-allocated objects
% from heap-allocated objects.
% \documentstyle[preprint]{acmconf}
\documentclass[11pt]{article}
\oddsidemargin 0.1 in % Note that \oddsidemargin = \evensidemargin
......@@ -20,7 +19,7 @@
\marginparsep 0 in
\sloppy
\usepackage{epsfig}
%\usepackage{epsfig}
\newcommand{\note}[1]{{\em Note: #1}}
% DIMENSION OF TEXT:
......@@ -91,6 +90,12 @@ architectures. It has been partly replaced by unboxed tuples
explicitly state where results should be returned in registers (or on
the stack) instead of on the heap.
\item
Lazy black-holing has been replaced by eager black-holing. The
problem with lazy black-holing is that it leaves slop in the heap
which conflicts with the use of a mostly-copying collector.
\end{itemize}
\subsection{Wish list}
......@@ -131,6 +136,11 @@ mutually-exclusive choices.
\begin{description}
\item[@SEQUENTIAL@] No concurrency or parallelism support.
This configuration might not support interrupt recovery.
\note{There's probably not much point in supporting this option. If
we've gone to the effort of supporting concurency, we don't gain
much by being able to turn it off.}
\item[@CONCURRENT@] Support for concurrency but not for parallelism.
\item[@CONCURRENT@+@GRANSIM@] Concurrency support and simulated parallelism.
\item[@CONCURRENT@+@PARALLEL@] Concurrency support and real parallelism.
......@@ -172,7 +182,7 @@ pointed types are the only things which can be lazily evaluated. In
the STG machine, this means that they are the only things that can be
{\em entered} or {\em updated} and it requires that they be boxed.
\item An {\em unpointed} type is one that does not contains $\bot$.
\item An {\em unpointed} type is one that does not contain $\bot$.
Variables with unpointed types are never delayed --- they are always
evaluated when they are constructed. In the STG machine, this means
that they cannot be {\em entered} or {\em updated}. Unpointed objects
......@@ -200,7 +210,7 @@ words and pointers are the same size.
\subsection{Subtle Dependencies}
Some decisions have very subtle consequences which should be written
down in case we want to change our minds.
down in case we want to change our minds.
\begin{itemize}
......@@ -234,10 +244,16 @@ discusses who does the stack check and how much space they need.
Heap objects never contain slop --- this is required if we want to
support mostly-copying garbage collection.
This is a big problem when updating since the updatee is usually
bigger than an indirection object. The fix is to overwrite the end of
the updatee with ``slop objects'' (described in
section~\ref{sect:slop-objects}).
This is hard to arrange if we do \emph{lazy} blackholing
(section~\ref{sect:lazy-black-holing}) so we currently plan to
blackhole an object when we push the update frame.
\item
Info tables for constructors contain enough information to decide which
......@@ -268,12 +284,6 @@ instead of
\subsection{Unboxed tuples}\label{sect:unboxed-tuples}
\Note{We're not planning to implement this right away. There doesn't
seem to be any real difficulty adding it to the runtime system but
it'll take a lot of work adding it to the compiler. Since unboxed
tuples do not trigger allocation, the syntax could be modified to allow
unboxed tuples in expressions.}
Functions can take multiple arguments as easily as they can take one
argument: there's no cost for adding another argument. But functions
can only return one result: the cost of adding a second ``result'' is
......@@ -314,20 +324,44 @@ or multiple stack slots. At first sight, this seems a little strange
but it's no different from passing double precision floats in two
registers.
Note that unboxed tuples can only have one constructor and that
Notes:
\begin{itemize}
\item
Unboxed tuples can only have one constructor and that
thunks never have unboxed types --- so we'll never try to update an
unboxed constructor. The restriction to a single constructor is
largely to avoid garbage collection complications.
\item
The core syntax does not allow variables to be bound to
unboxed tuples (ie in default case alternatives or as function arguments)
and does not allow unboxed tuples to be fields of other constructors.
However, there's no harm in allowing it in the source syntax as a
convenient, but easily removed, syntactic sugar.
\item
The compiler generates a closure of the form
@
> c = \ x y z -> C x y z
@
for every constructor (whether boxed or unboxed).
This closure is normally used during desugaring to ensure that
constructors are saturated and to apply any strictness annotations.
They are also used when returning unboxed constructors to the machine
code evaluator from the bytecode evaluator and when a heap check fails
in a return continuation for an unboxed-tuple scrutinee.
\end{itemize}
\subsection{STG Syntax}
\ToDo{Insert STG syntax with appropriate changes.}
%-----------------------------------------------------------------------------
\part{Evaluation Model}
\section{Overview}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\part{System Overview}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
This part is concerned with defining the external interfaces of the
major components of the system; the next part is concerned with their
......@@ -336,1629 +370,3481 @@ inner workings.
The major components of the system are:
\begin{itemize}
\item The scheduler
\item The loader
\item The storage manager
\item The machine code evaluator (compiled code)
\item The bytecode evaluator (interpreted code)
\item The evaluators
\item The loader
\item The compilers
\end{itemize}
\section{The Compilers}
Need to describe interface files.
Here's an example - but I don't know the grammar - ADR.
@
_interface_ Main 1
_exports_
Main main ;
_declarations_
1 main _:_ IOBase.IO PrelBase.();;
@
\ToDo{Insert diagram showing all components underneath the scheduler
and communicating only with the scheduler}
\section{The Scheduler}
\section{Scheduler}
The Scheduler is the heart of the run-time system. A running program
consists of a single running thread, and a list of runnable and
blocked threads. The running thread returns to the scheduler when any
of the following conditions arises:
blocked threads. All threads consist of a stack and a few words of
status information. Except for the running thread, all threads have a
closure on top of their stack; the scheduler restarts a thread by
entering an evaluator which performs some reduction and returns.
\begin{itemize}
\item A heap check fails, and a garbage collection is required
\item Compiled code needs to switch to interpreted code, and vice
versa.
\item The thread becomes blocked.
\item The thread is preempted.
\end{itemize}
\subsection{The scheduler's main loop}
A running system has a global state, consisting of
The scheduler consists of a loop which chooses a runnable thread and
invokes one of the evaluators which performs some reduction and
returns.
\begin{itemize}
\item @Hp@, the current heap pointer, which points to the next
available address in the Heap.
\item @HpLim@, the heap limit pointer, which points to the end of the
heap.
\item The Thread Preemption Flag, which is set whenever the currently
running thread should be preempted at the next opportunity.
\item A list of runnable threads.
\item A list of blocked threads.
\end{itemize}
The scheduler also takes care of system-wide issues such as heap
overflow or communication with other processors (in the parallel
system) and thread-specific problems such as stack overflow.
Each thread is represented by a Thread State Object (TSO), which is
described in detail in Section \ref{sect:TSO}.
\subsection{Creating a thread}
The following is pseudo-code for the inner loop of the scheduler
itself.
Threads are created:
@
while (threads_exist) {
// handle global problems: GC, parallelism, etc
if (need_gc) gc();
if (external_message) service_message();
// deal with other urgent stuff
\begin{itemize}
pick a runnable thread;
do {
// enter object on top of stack
// if the top object is a BCO, we must enter it
// otherwise appply any heuristic we wish.
if (thread->stack[thread->sp]->info.type == BCO) {
status = runHugs(thread,&smInfo);
} else {
status = runGHC(thread,&smInfo);
}
switch (status) { // handle local problems
case (StackOverflow): enlargeStack; break;
case (Error e) : error(thread,e); break;
case (ExitWith e) : exit(e); break;
case (Yield) : break;
}
} while (thread_runnable);
}
@
\item
\subsection{Invoking the garbage collector}
\subsection{Putting the thread to sleep}
When the scheduler is first invoked.
\subsection{Calling C from Haskell}
\item
We distinguish between "safe calls" where the programmer guarantees
that the C function will not call a Haskell function or, in a
multithreaded system, block for a long period of time and "unsafe
calls" where the programmer cannot make that guarantee.
When a message is received from another processor (I think). (Parallel
system only.)
Safe calls are performed without returning to the scheduler and are
discussed elsewhere (\ToDo{discuss elsewhere}).
\item
Unsafe calls are performed by returning an array (outside the Haskell
heap) of arguments and a C function pointer to the scheduler. The
scheduler allocates a new thread from the operating system
(multithreaded system only), spawns a call to the function and
continues executing another thread. When the ccall completes, the
thread informs the scheduler and the scheduler adds the thread to the
runnable threads list.
When a C program calls some Haskell code.
\ToDo{Describe this in more detail.}
\end{itemize}
\subsection{Calling Haskell from C}
\subsection{Restarting a thread}
When C calls a Haskell closure, it sends a message to the scheduler
thread. On receiving the message, the scheduler creates a new Haskell
thread, pushes the arguments to the C function onto the thread's stack
(with tags for unboxed arguments) pushes the Haskell closure and adds
the thread to the runnable list so that it can be entered in the
normal way.
The evaluators can reduce almost all types of closure except that only
the machine code evaluator can reduce GHC-compiled closures and only
the bytecode evaluator can reduce Hugs-compiled closures.
Consequently, the scheduler may use either evaluator to restart a
thread unless the top closure is a @BCO@ or contains machine code.
When the closure returns, the scheduler sends back a message which
awakens the (C) thread.
However, if the top of the stack contains a constructor, the scheduler
should use the machine code evaluator to restart the thread. This
allows the bytecode evaluator to return a constructor to a machine
code return address by pushing the constructor on top of the stack and
returning to the scheduler. If the return address under the
constructor is @HUGS_RET@, the entry code for @HUGS_RET@ will
rearrange the stack so that the return @BCO@ is on top of the stack
and return to the scheduler which will then call the bytecode
evaluator. There is little point in trying to shorten this slightly
indirect route since it will happen very rarely if at all.
\ToDo{Do we need to worry about the garbage collector deallocating the
thread if it gets blocked?}
\subsection{Returning from a thread}
\subsection{Switching Worlds}
\label{sect:switching-worlds}
The evaluators return to the scheduler when any of the following
conditions arise:
\ToDo{This has all changed: we always leave a closure on top of the
stack if we mean to continue executing it. The scheduler examines the
top of the stack and tries to guess which world we want to be in. If
it finds a @BCO@, it certainly enters Hugs, if it finds a @GHC@
closure, it certainly enters GHC and if it finds a standard closure,
it is free to choose either one but it's probably best to enter GHC
for everything except @BCO@s and perhaps @AP@s.}
\begin{itemize}
\item A heap check fails, and a garbage collection is required
\item Compiled code needs to switch to interpreted code, and vice versa.
\item The evaluator needs to perform an ``unsafe'' C call.
\item The thread becomes blocked.
\item The thread is preempted.
\item The thread terminates.
\end{itemize}
Because this is a combined compiled/interpreted system, the
interpreter will sometimes encounter compiled code, and vice-versa.
Except when the thread terminates, the thread always terminates with a
closure on the top of the stack.
All world-switches go via the scheduler, ensuring that the world is in
a known state ready to enter either compiled code or the interpreter.
When a thread is run from the scheduler, the @whatNext@ field in the
TSO (Section \ref{sect:TSO}) is checked to find out how to execute the
thread.
\subsection{Preempting a thread}
\begin{itemize}
\item If @whatNext@ is set to @ReturnGHC@, we load up the required
registers from the TSO and jump to the address at the top of the user
stack.
\item If @whatNext@ is set to @EnterGHC@, we load up the required
registers from the TSO and enter the closure pointed to by the top
word of the stack.
\item If @whatNext@ is set to @EnterHugs@, we enter the top thing on
the stack, using the interpreter.
\end{itemize}
Strictly speaking, threads cannot be preempted --- the scheduler
merely sets a preemption request flag which the thread must arrange to
test on a regular basis. When an evaluator finds that the preemption
request flag is set, it pushes an appropriate closure onto the stack
and returns to the scheduler.
There are four cases we need to consider:
In the bytecode interpreter, the flag is tested whenever we enter a
closure. If the preemption flag is set, it leaves the closure on top
of the stack and returns to the scheduler.
\begin{enumerate}
\item A GHC thread enters a Hugs-built closure.
\item A GHC thread returns to a Hugs-compiled return address.
\item A Hugs thread enters a GHC-built closure.
\item A Hugs thread returns to a Hugs-compiled return address.
\end{enumerate}
In the machine code evaluator, the flag is only tested when a heap or
stack check fails. This is less expensive than testing the flag on
entering every closure but runs the risk that a thread will enter an
infinite loop which does not allocate any space. If the flag is set,
the evaluator returns to the scheduler exactly as if a heap check had
failed.
GHC-compiled modules cannot call functions in a Hugs-compiled module
directly, because the compiler has no information about arities in the
external module. Therefore it must assume any top-level objects are
CAFs, and enter their closures.
\subsection{``Safe'' and ``unsafe'' C calls}
\ToDo{Hugs-built constructors?}
There are two ways of calling C:
We now examine the various cases one by one and describe how the
switch happens in each situation.
\begin{description}
\subsection{A GHC thread enters a Hugs-built closure}
\label{sect:ghc-to-hugs-closure}
\item[``Safe'' C calls]
are used if the programer is certain that the C function will not
do anything dangerous such as calling a Haskell function or an
operating system call which blocks the thread for a long period of time.
\footnote{Warning: this use of ``safe'' and ``unsafe'' is the exact
opposite of the usage for functions like @unsafePerformIO@.}
Safe C calls are faster but must be hand-checked by the programmer.
Safe C calls are performed by pushing the arguments onto the C stack
and jumping to the C function's entry point. On exit, the result of
the function is in a register which is returned to the Haskell code as
an unboxed value.
\item[``Unsafe'' C calls] are used if the programmer suspects that the
thread may do something dangerous like blocking or calling a Haskell
function. Unsafe C calls are relatively slow but are less problematic.
Unsafe C calls are performed by pushing the arguments onto the Haskell
stack, pushing a return continuation and returning a \emph{C function
descriptor} to the scheduler. The scheduler suspends the Haskell thread,
spawns a new operating system thread which pops the arguments off the
Haskell stack onto the C stack, calls the C function, pushes the
function result onto the Haskell stack and informs the scheduler that
the C function has completed and the Haskell thread is now runnable.
There is three possibilities: GHC has entered a @PAP@, or it has
entered a @AP@, or it has entered the BCO directly (for a top-level
function closure). @AP@s and @PAP@s are ``standard closures'' and
so do not require us to enter the bytecode interpreter.
\end{description}
The entry code for a BCO does the following:
The bytecode evaluator will probably treat all C calls as being unsafe.
\begin{itemize}
\item Push the address of the object entered on the stack.
\item Save the current state of the thread in its TSO.
\item Return to the scheduler, setting @whatNext@ to @EnterHugs@.
\end{itemize}
\ToDo{It might be good for the programmer to indicate how the program is
unsafe. For example, if we distinguish between C functions which might
call Haskell functions and those which might block, we could perform a
safe call for blocking functions in a single-threaded system or, perhaps, in a multi-threaded system which only happens to have a single thread at the moment.}
BCO's for thunks and functions have the same entry conventions as
slow entry points: they expect to find their arguments on the stac
with unboxed arguments preceded by appropriate tags.
\subsection{A GHC thread returns to a Hugs-compiled return address}
\label{sect:ghc-to-hugs-return}
\section{The Evaluators}
Hugs return addresses are laid out as in Figure
\ref{fig:hugs-return-stack}. If GHC is returning, it will return to
the address at the top of the stack, namely @HUGS_RET@. The code at
@HUGS_RET@ performs the following:
All the scheduler needs to know about evaluation is how to manipulate
threads and how to find the closure on top of the stack. The
evaluators need to agree on the representations of certain objects and
on how to return from the scheduler.
\subsection{Returning to the Scheduler}
\label{sect:switching-worlds}
The evaluators return to the scheduler under three circumstances:
\begin{itemize}
\item pushes \Arg{1} (the return value) on the stack.
\item saves the thread state in the TSO
\item returns to the scheduler with @whatNext@ set to @EnterHugs@.
\end{itemize}
\noindent When Hugs runs, it will enter the return value, which will
return using the correct Hugs convention (Section
\ref{sect:hugs-return-convention}) to the return address underneath it
on the stack.
\item
\subsection{A Hugs thread enters a GHC-compiled closure}
\label{sect:hugs-to-ghc-closure}
When they enter a closure built by the other evaluator. That is, when
the bytecode interpreter enters a closure compiled by GHC or when the
machine code evaluator enters a BCO.
Hugs can recognise a GHC-built closure as not being one of the
following types of object:
\item
\begin{itemize}
\item A @BCO@,
\item A @AP@,
\item A @PAP@,
\item An indirection, or
\item A constructor.
\end{itemize}
When they return to a return continuation built by the other
evaluator. That is, when the machine code evaluator returns to a
continuation built by Hugs or when the bytecode evaluator returns to a
continuation built by GHC.
When Hugs is called on to enter a GHC closure, it executes the
following sequence of instructions:
\item
When a heap or stack check fails or when the preemption flag is set.
\begin{itemize}
\item Push the address of the closure on the stack.
\item Save the current state of the thread in the TSO.
\item Return to the scheduler, with the @whatNext@ field set to
@EnterGHC@.
\end{itemize}
\subsection{A Hugs thread returns to a GHC-compiled return address}
\label{sect:hugs-to-ghc-return}
In all cases, they return to the scheduler with a closure on top of
the stack. The mechanism used to trigger the world switch and the
choice of closure left on top of the stack varies according to which
world is being left and what is being returned.
When Hugs encounters a return address on the stack that is not
@HUGS_RET@, it knows that a world-switch is required. At this point
the stack contains a pointer to the return value, followed by the GHC
return address. The following sequence is then performed:
\subsubsection{Leaving the bytecode evaluator}
\label{sect:hugs-to-ghc-switch}
\begin{itemize}
\item save the state of the thread in the TSO.
\item return to the scheduler, setting @whatNext@ to @EnterGHC@.
\end{itemize}
\paragraph{Entering a machine code closure}
The first thing that GHC will do is enter the object on the top of the
stack, which is a pointer to the return value. This value will then
return itself to the return address using the GHC return convention.
When it enters a closure, the bytecode evaluator performs a switch
based on the type of closure (@AP@, @PAP@, @Ind@, etc). On entering a
machine code closure, it returns to the scheduler with the closure on
top of the stack.
\section{The Loader}
\paragraph{Returning a constructor}
\ToDo{Is it ok to load code when threads are running?}
When it enters a constructor, the bytecode evaluator tests the return
continuation on top of the stack. If it is a machine code
continuation, it returns to the scheduler with the constructor on top
of the stack.
In a batch mode system, we can statically link all the modules
together. In an interactive system we need a loader which will
explicitly load and unload individual modules (or, perhaps, blocks of
mutually dependent modules) and resolve references between modules.
\note{This is why the scheduler must enter the machine code evaluator
if it finds a constructor on top of the stack.}
While many operating systems provide support for dynamic loading and
will automatically resolve cross-module references for us, we generally
cannot rely on being able to load mutually dependent modules.
\paragraph{Returning an unboxed value}
A portable solution is to perform some of the linking ourselves. Each module
should provide three global symbols:
\begin{itemize}
\item
An initialisation routine. (Might also be used for finalisation.)
\item
A table of symbols it exports.
Entries in this table consist of the symbol name and the address of the
names value.
\item
A table of symbols it imports.
Entries in this table consist of the symbol name and a list of references
to that symbol.
\end{itemize}
\note{Hugs doesn't support unboxed values in source programs but they
are used for a few complex primops.}
On loading a group of modules, the loader adds the contents of the
export lists to a symbol table and then fills in all the references in the
import lists.
When it enters a constructor, the bytecode evaluator tests the return
continuation on top of the stack. If it is a machine code
continuation, it returns to the scheduler with the unboxed value and a
special closure on top of the stack. When the closure is entered (by
the machine code evaluator), it returns the unboxed value on top of
the stack to the return continuation under it.
References in import lists are of two types:
\begin{description}
\item[ References in machine code ]
The runtime system (or GHC?) provides one of these closures for each
unboxed type. Hugs cannot generate them itself since the entry code is
really very tricky.
The most efficient approach is to patch the machine code directly, but
this will be a lot of work, very painful to port and rather fragile.
\paragraph{Heap/Stack overflow and preemption}
Alternatively, the loader could store the value of each symbol in the
import table for each module and the compiled code can access all
external objects through the import table. This requires that the
import table be writable but does not require that the machine code or
info tables be writable.
The bytecode evaluator tests for heap/stack overflow and preemption
when entering a BCO and simply returns with the BCO on top of the
stack.
\item[ References in data structures (SRTs and static data constructors) ]
\subsubsection{Leaving the machine code evaluator}
\label{sect:ghc-to-hugs-switch}
Either we patch the SRTs and constructors directly or we somehow use
indirections through the symbol table. Patching the SRTs requires
that we make them writable and prevents us from making effective use
of virtual memories that use copy-on-write policies. Using an
indirection is possible but tricky.
\paragraph{Entering a BCO}
Note: We could avoid patching machine code if all references to
eternal references went through the SRT --- then we just have one
thing to patch. But the SRT always contains a pointer to the closure
rather than the fast entry point (say), so we'd take a big performance
hit for doing this.
The entry code for a BCO pushes the BCO onto the stack and returns to
the scheduler.
\end{description}
\paragraph{Returning a constructor}
\section{Compiled Execution}
We avoid the need to test return addresses in the machine code
evaluator by pushing a special return address on top of a pointer to
the bytecode return continuation. Figure~\ref{fig:hugs-return-stack}
shows the state of the stack just before evaluating the scrutinee.
This section describes the framework in which compiled code evaluates
expressions. Only at certain points will compiled code need to be
able to talk to the interpreted world; these are discussed in Section
\ref{sect:switching-worlds}.
\begin{figure}[ht]
\begin{center}
@
| stack |
+----------+
| bco |--> BCO
+----------+
| HUGS_RET |
+----------+
@
%\input{hugs_return1.pstex_t}
\end{center}
\caption{Stack layout for evaluating a scrutinee}
\label{fig:hugs-return-stack}
\end{figure}
\subsection{Calling conventions}
This return address rearranges the stack so that the bco pointer is
above the constructor on the stack (as shown in
figure~\ref{fig:hugs-boxed-return}) and returns to the scheduler.
\subsubsection{The call/return registers}
\begin{figure}[ht]
\begin{center}
@
| stack |
+----------+
| con |--> Constructor