Commit c5a97ea0 authored by Simon Marlow's avatar Simon Marlow
Browse files

Document SMP support

parent 123e4528
......@@ -1092,45 +1092,6 @@
</informaltable>
</sect2>
<sect2>
<title>Parallelism options</title>
<para><xref linkend="sec-using-parallel"/></para>
<informaltable>
<tgroup cols="4" align="left" colsep="1" rowsep="1">
<thead>
<row>
<entry>Flag</entry>
<entry>Description</entry>
<entry>Static/Dynamic</entry>
<entry>Reverse</entry>
</row>
</thead>
<tbody>
<row>
<entry><option>-gransim</option></entry>
<entry>Enable GRANSIM</entry>
<entry>static</entry>
<entry>-</entry>
</row>
<row>
<entry><option>-parallel</option></entry>
<entry>Enable Parallel Haskell</entry>
<entry>static</entry>
<entry>-</entry>
</row>
<row>
<entry><option>-smp</option></entry>
<entry>Enable SMP support</entry>
<entry>static</entry>
<entry>-</entry>
</row>
</tbody>
</tgroup>
</informaltable>
</sect2>
<sect2>
<title>C pre-processor options</title>
......
<?xml version="1.0" encoding="iso-8859-1"?>
<sect1 id="concurrent-and-parallel">
<title>Concurrent and Parallel Haskell</title>
<para>
<indexterm><primary>Concurrent Haskell</primary></indexterm>
<indexterm><primary>Parallel Haskell</primary></indexterm>
Concurrent and Parallel Haskell are Glasgow extensions to Haskell
which let you structure your program as a group of independent
`threads'.
</para>
<para>
Concurrent and Parallel Haskell have very different purposes.
</para>
<para>
Concurrent Haskell is for applications which have an inherent
structure of interacting, concurrent tasks (i.e. `threads'). Threads
in such programs may be <emphasis>required</emphasis>. For example, if a concurrent thread has been spawned to handle a mouse click, it isn't
optional&mdash;the user wants something done!
</para>
<para>
A Concurrent Haskell program implies multiple `threads' running within
a single Unix process on a single processor.
</para>
<para>
You will find at least one paper about Concurrent Haskell hanging off
of <ulink url="http://research.microsoft.com/~simonpj/">Simon Peyton
Jones's Web page</ulink>.
</para>
<para>
Parallel Haskell is about <emphasis>speed</emphasis>&mdash;spawning
threads onto multiple processors so that your program will run faster.
The `threads' are always <emphasis>advisory</emphasis>&mdash;if the
runtime system thinks it can get the job done more quickly by
sequential execution, then fine.
</para>
<para>
A Parallel Haskell program implies multiple processes running on
multiple processors, under a PVM (Parallel Virtual Machine) framework.
An MPI interface is under development but not fully functional, yet.
</para>
<para>
Parallel Haskell is still relatively new; it is more about &ldquo;research
fun&rdquo; than about &ldquo;speed.&rdquo; That will change.
</para>
<para>
Check the <ulink url="http://www.cee.hw.ac.uk/~dsg/gph/">GPH Page</ulink>
for more information on &ldquo;GPH&rdquo; (Haskell98 with extensions for
parallel execution), the latest version of &ldquo;GUM&rdquo; (the runtime
system to enable parallel executions) and papers on research issues. A
list of publications about GPH and about GUM is also available from Simon's
Web Page.
</para>
<para>
Some details about Parallel Haskell follow. For more information
about concurrent Haskell, see the module
<literal>Control.Concurrent</literal> in the library documentation.
</para>
<sect2>
<title>Features specific to Parallel Haskell
<indexterm><primary>Parallel Haskell&mdash;features</primary></indexterm></title>
<sect3>
<title>The <literal>Parallel</literal> interface (recommended)
<indexterm><primary>Parallel interface</primary></indexterm></title>
<para>
GHC provides two functions for controlling parallel execution, through
the <literal>Parallel</literal> interface:
</para>
<para>
<sect1 id="lang-parallel">
<title>Parallel Haskell</title>
<indexterm><primary>parallelism</primary>
</indexterm>
<para>There are two implementations of Parallel Haskell: SMP paralellism
<indexterm><primary>SMP</primary></indexterm>
which is built-in to GHC (see <xref linkend="sec-using-smp" />) and
supports running Parallel Haskell programs on a single multiprocessor
machine, and
Glasgow Parallel Haskell<indexterm><primary>Glasgow Parallel Haskell</primary></indexterm>
(GPH) which supports running Parallel Haskell
programs on both clusters of machines or single multiprocessors. GPH is
developed and distributed
separately from GHC (see <ulink url="http://www.cee.hw.ac.uk/~dsg/gph/">The
GPH Page</ulink>).</para>
<para>Ordinary single-threaded Haskell programs will not benefit from
enabling SMP parallelism alone. You must expose parallelism to the
compiler in one of the following two ways.</para>
<sect2>
<title>Running Concurrent Haskell programs in parallel</title>
<para>The first possibility is to use concurrent threads to structure your
program, and make sure
that you spread computation amongst the threads. The runtime will
schedule the running Haskell threads among the available OS
threads, running as many in parallel as you specified with the
<option>-N</option> RTS option.</para>
</sect2>
<sect2>
<title>Annotating pure code for parallelism</title>
<para>The simplest mechanism for extracting parallelism from pure code is
to use the <literal>par</literal> combinator, which is closely related to (and often used
with) <literal>seq</literal>. Both of these are available from <ulink
url="../libraries/base/Control-Parallel.html"><literal>Control.Parallel</literal></ulink>:</para>
<programlisting>
interface Parallel where
infixr 0 `par`
infixr 1 `seq`
par :: a -&#62; b -&#62; b
seq :: a -&#62; b -&#62; b
</programlisting>
</para>
seq :: a -&#62; b -&#62; b</programlisting>
<para>
The expression <literal>(x `par` y)</literal> <emphasis>sparks</emphasis> the evaluation of <literal>x</literal>
(to weak head normal form) and returns <literal>y</literal>. Sparks are queued for
execution in FIFO order, but are not executed immediately. At the
next heap allocation, the currently executing thread will yield
control to the scheduler, and the scheduler will start a new thread
(until reaching the active thread limit) for each spark which has not
already been evaluated to WHNF.
</para>
<para>The expression <literal>(x `par` y)</literal>
<emphasis>sparks</emphasis> the evaluation of <literal>x</literal>
(to weak head normal form) and returns <literal>y</literal>. Sparks are
queued for execution in FIFO order, but are not executed immediately. If
the runtime detects that there is an idle CPU, then it may convert a
spark into a real thread, and run the new thread on the idle CPU. In
this way the available parallelism is spread amongst the real
CPUs.</para>
<para>
The expression <literal>(x `seq` y)</literal> evaluates <literal>x</literal> to weak head normal
form and then returns <literal>y</literal>. The <function>seq</function> primitive can be used to
force evaluation of an expression beyond WHNF, or to impose a desired
execution sequence for the evaluation of an expression.
</para>
<para>
For example, consider the following parallel version of our old
nemesis, <function>nfib</function>:
</para>
<para>
<para>For example, consider the following parallel version of our old
nemesis, <function>nfib</function>:</para>
<programlisting>
import Parallel
import Control.Parallel
nfib :: Int -&#62; Int
nfib n | n &#60;= 1 = 1
| otherwise = par n1 (seq n2 (n1 + n2 + 1))
where n1 = nfib (n-1)
n2 = nfib (n-2)
</programlisting>
</para>
<para>
For values of <varname>n</varname> greater than 1, we use <function>par</function> to spark a thread
to evaluate <literal>nfib (n-1)</literal>, and then we use <function>seq</function> to force the
parent thread to evaluate <literal>nfib (n-2)</literal> before going on to add
together these two subexpressions. In this divide-and-conquer
approach, we only spark a new thread for one branch of the computation
(leaving the parent to evaluate the other branch). Also, we must use
<function>seq</function> to ensure that the parent will evaluate <varname>n2</varname> <emphasis>before</emphasis>
<varname>n1</varname> in the expression <literal>(n1 + n2 + 1)</literal>. It is not sufficient to
reorder the expression as <literal>(n2 + n1 + 1)</literal>, because the compiler may
not generate code to evaluate the addends from left to right.
</para>
</sect3>
<sect3>
<title>Underlying functions and primitives
<indexterm><primary>parallelism primitives</primary></indexterm>
<indexterm><primary>primitives for parallelism</primary></indexterm></title>
<para>
The functions <function>par</function> and <function>seq</function> are wired into GHC, and unfold
into uses of the <function>par&num;</function> and <function>seq&num;</function> primitives, respectively. If
you'd like to see this with your very own eyes, just run GHC with the
<option>-ddump-simpl</option> option. (Anything for a good time&hellip;)
</para>
</sect3>
<sect3>
<title>Scheduling policy for concurrent threads
<indexterm><primary>Scheduling&mdash;concurrent</primary></indexterm>
<indexterm><primary>Concurrent scheduling</primary></indexterm></title>
<para>
Runnable threads are scheduled in round-robin fashion. Context
switches are signalled by the generation of new sparks or by the
expiry of a virtual timer (the timer interval is configurable with the
<option>-C[&lt;num&gt;]</option><indexterm><primary>-C&lt;num&gt; RTS option (concurrent,
parallel)</primary></indexterm> RTS option). However, a context switch doesn't
really happen until the current heap block is full. You can't get any
faster context switching than this.
</para>
<para>
When a context switch occurs, pending sparks which have not already
been reduced to weak head normal form are turned into new threads.
However, there is a limit to the number of active threads (runnable or
blocked) which are allowed at any given time. This limit can be
adjusted with the <option>-t&lt;num&gt;</option><indexterm><primary>-t &lt;num&gt; RTS option (concurrent, parallel)</primary></indexterm>
RTS option (the default is 32). Once the
thread limit is reached, any remaining sparks are deferred until some
of the currently active threads are completed.
</para>
</sect3>
<sect3>
<title>Scheduling policy for parallel threads
<indexterm><primary>Scheduling&mdash;parallel</primary></indexterm>
<indexterm><primary>Parallel scheduling</primary></indexterm></title>
<para>
In GUM we use an unfair scheduler, which means that a thread continues to
perform graph reduction until it blocks on a closure under evaluation, on a
remote closure or until the thread finishes.
</para>
</sect3>
</sect2>
n2 = nfib (n-2)</programlisting>
<para>For values of <varname>n</varname> greater than 1, we use
<function>par</function> to spark a thread to evaluate <literal>nfib (n-1)</literal>,
and then we use <function>seq</function> to force the
parent thread to evaluate <literal>nfib (n-2)</literal> before going on
to add together these two subexpressions. In this divide-and-conquer
approach, we only spark a new thread for one branch of the computation
(leaving the parent to evaluate the other branch). Also, we must use
<function>seq</function> to ensure that the parent will evaluate
<varname>n2</varname> <emphasis>before</emphasis> <varname>n1</varname>
in the expression <literal>(n1 + n2 + 1)</literal>. It is not sufficient
to reorder the expression as <literal>(n2 + n1 + 1)</literal>, because
the compiler may not generate code to evaluate the addends from left to
right.</para>
<para>When using <literal>par</literal>, the general rule of thumb is that
the sparked computation should be required at a later time, but not too
soon. Also, the sparked computation should not be too small, otherwise
the cost of forking it in parallel will be too large relative to the
amount of parallelism gained. Getting these factors right is tricky in
practice.</para>
<para>More sophisticated combinators for expressing parallelism are
available from the <ulink
url="../libraries/base/Control-Parallel-Strategies.html"><literal>Control.Parallel.Strategies</literal></ulink> module.
This module builds functionality around <literal>par</literal>,
expressing more elaborate patterns of parallel computation, such as
parallel <literal>map</literal>.</para>
</sect2>
</sect1>
......
......@@ -839,26 +839,43 @@ $ cat foo.hspp</screen>
<indexterm><primary><option>-threaded</option></primary></indexterm>
</term>
<listitem>
<para>Link the program with the "threaded" runtime system.
This version of the runtime is designed to be used in
programs that use multiple operating-system threads. It
supports calls to foreign-exported functions from multiple
OS threads. Calls to foreign functions are made using the
same OS thread that created the Haskell thread (if it was
created by a call-in), or an arbitrary OS thread otherwise
(if the Haskell thread was created by
<para>Link the program with the "threaded" version of the
runtime system. The threaded runtime system is so-called
because it manages multiple OS threads, as opposed to the
default runtime system which is purely
single-threaded.</para>
<para>Note that you do <emphasis>not</emphasis> need
<option>-threaded</option> in order to use concurrency; the
single-threaded runtime supports concurrency between Haskell
threads just fine.</para>
<para>The threaded runtime system provides the following
benefits:</para>
<itemizedlist>
<listitem>
<para>Parallelism<indexterm><primary>parallelism</primary></indexterm> on a multiprocessor<indexterm><primary>multiprocessor</primary></indexterm><indexterm><primary>SMP</primary></indexterm> or multicore<indexterm><primary>multicore</primary></indexterm>
machine. See <xref linkend="sec-using-smp" />.</para>
<para>The ability to make a foreign call that does not
block all other Haskell threads.</para>.
<para>The ability to invoke foreign exported Haskell
functions from multiple OS threads.</para>
</listitem>
</itemizedlist>
<para>With <option>-threaded</option>, calls to foreign
functions are made using the same OS thread that created the
Haskell thread (if it was created by a call to a foreign
exported Haskell function), or an arbitrary OS thread
otherwise (if the Haskell thread was created by
<literal>forkIO</literal>).</para>
<para>More details on the use of "bound threads" in the
threaded runtime can be found in the <ulink
url="../libraries/base/Control.Concurrent.html"><literal>Control.Concurrent</literal></ulink> module.</para>
<para>The threaded RTS does <emphasis>not</emphasis>
support using multiple CPUs to speed up execution of a
multi-threaded Haskell program. The GHC runtime platform
is still single-threaded, but using the
<option>-threaded</option> option it can be used safely in
a multi-threaded environment.</para>
</listitem>
</varlistentry>
</variablelist>
......
......@@ -398,11 +398,12 @@
</sect2>
<sect2>
<title>RTS options for profiling and Concurrent/Parallel Haskell</title>
<title>RTS options for profiling and parallelism</title>
<para>The RTS options related to profiling are described in <xref
linkend="rts-options-heap-prof"/>; and those for concurrent/parallel
stuff, in <xref linkend="parallel-rts-opts"/>.</para>
linkend="rts-options-heap-prof"/>, those for concurrency in
<xref linkend="sec-using-concurrent" />, and those for parallelism in
<xref linkend="parallel-options"/>.</para>
</sect2>
<sect2 id="rts-options-debugging">
......
......@@ -1533,353 +1533,86 @@ f "2" = 2
</variablelist>
</sect1>
<sect1 id="sec-using-parallel">
<title>Using parallel Haskell</title>
<para>
<indexterm><primary>Parallel Haskell</primary><secondary>using</secondary></indexterm>
&lsqb;NOTE: GHC does not support Parallel Haskell by default, you need to
obtain a special version of GHC from the <ulink
url="http://www.cee.hw.ac.uk/~dsg/gph/">GPH</ulink> site. Also,
you won't be able to execute parallel Haskell programs unless PVM3
(parallel Virtual Machine, version 3) is installed at your site.&rsqb;
</para>
<para>
To compile a Haskell program for parallel execution under PVM, use the
<option>-parallel</option> option,<indexterm><primary>-parallel
option</primary></indexterm> both when compiling <emphasis>and
linking</emphasis>. You will probably want to <literal>import
Control.Parallel</literal> into your Haskell modules.
</para>
<para>
To run your parallel program, once PVM is going, just invoke it
&ldquo;as normal&rdquo;. The main extra RTS option is
<option>-qp&lt;n&gt;</option>, to say how many PVM
&ldquo;processors&rdquo; your program to run on. (For more details of
all relevant RTS options, please see <xref
linkend="parallel-rts-opts"/>.)
</para>
<para>
In truth, running parallel Haskell programs and getting information
out of them (e.g., parallelism profiles) is a battle with the vagaries of
PVM, detailed in the following sections.
</para>
<sect2 id="pvm-dummies">
<title>Dummy's guide to using PVM</title>
<para>
<indexterm><primary>PVM, how to use</primary></indexterm>
<indexterm><primary>parallel Haskell&mdash;PVM use</primary></indexterm>
Before you can run a parallel program under PVM, you must set the
required environment variables (PVM's idea, not ours); something like,
probably in your <filename>.cshrc</filename> or equivalent:
<programlisting>
setenv PVM_ROOT /wherever/you/put/it
setenv PVM_ARCH `$PVM_ROOT/lib/pvmgetarch`
setenv PVM_DPATH $PVM_ROOT/lib/pvmd
</programlisting>
</para>
<para>
Creating and/or controlling your &ldquo;parallel machine&rdquo; is a purely-PVM
business; nothing specific to parallel Haskell. The following paragraphs
describe how to configure your parallel machine interactively.
</para>
<para>
If you use parallel Haskell regularly on the same machine configuration it
is a good idea to maintain a file with all machine names and to make the
environment variable PVM_HOST_FILE point to this file. Then you can avoid
the interactive operations described below by just saying
</para>
<programlisting>
pvm $PVM_HOST_FILE
</programlisting>
<para>
You use the <command>pvm</command><indexterm><primary>pvm command</primary></indexterm> command to start PVM on your
machine. You can then do various things to control/monitor your
&ldquo;parallel machine;&rdquo; the most useful being:
</para>
<para>
<informaltable>
<tgroup cols="2">
<colspec align="left"/>
<tbody>
<row>
<entry><keycombo><keycap>Control</keycap><keycap>D</keycap></keycombo></entry>
<entry>exit <command>pvm</command>, leaving it running</entry>
</row>
<row>
<entry><command>halt</command></entry>
<entry>kill off this &ldquo;parallel machine&rdquo; &amp; exit</entry>
</row>
<row>
<entry><command>add &lt;host&gt;</command></entry>
<entry>add <command>&lt;host&gt;</command> as a processor</entry>
</row>
<row>
<entry><command>delete &lt;host&gt;</command></entry>
<entry>delete <command>&lt;host&gt;</command></entry>
</row>
<row>
<entry><command>reset</command></entry>
<entry>kill what's going, but leave PVM up</entry>
</row>
<row>
<entry><command>conf</command></entry>
<entry>list the current configuration</entry>
</row>
<row>
<entry><command>ps</command></entry>
<entry>report processes' status</entry>
</row>
<row>
<entry><command>pstat &lt;pid&gt;</command></entry>
<entry>status of a particular process</entry>
</row>
</tbody>
</tgroup>
</informaltable>
</para>
<para>
The PVM documentation can tell you much, much more about <command>pvm</command>!
</para>
</sect2>
<sect2 id="par-profiles">
<title>parallelism profiles</title>
<para>
<indexterm><primary>parallelism profiles</primary></indexterm>
<indexterm><primary>profiles, parallelism</primary></indexterm>
<indexterm><primary>visualisation tools</primary></indexterm>
</para>
<para>
With parallel Haskell programs, we usually don't care about the
results&mdash;only with &ldquo;how parallel&rdquo; it was! We want pretty pictures.
</para>
<para>
parallelism profiles (&agrave; la <command>hbcpp</command>) can be generated with the
<option>-qP</option><indexterm><primary>-qP RTS option</primary></indexterm> RTS option. The
per-processor profiling info is dumped into files named
<filename>&lt;full-path&gt;&lt;program&gt;.gr</filename>. These are then munged into a PostScript picture,
which you can then display. For example, to run your program
<filename>a.out</filename> on 8 processors, then view the parallelism profile, do:
</para>
<para>
<screen>
<prompt>&dollar;</prompt> ./a.out +RTS -qP -qp8
<prompt>&dollar;</prompt> grs2gr *.???.gr &#62; temp.gr # combine the 8 .gr files into one
<prompt>&dollar;</prompt> gr2ps -O temp.gr # cvt to .ps; output in temp.ps
<prompt>&dollar;</prompt> ghostview -seascape temp.ps # look at it!
</screen>
<sect1 id="sec-using-smp">
<title>Using SMP parallelism</title>
<indexterm><primary>parallelism</primary>
</indexterm>
<indexterm><primary>SMP</primary>
</indexterm>
</para>
<para>
The scripts for processing the parallelism profiles are distributed
in <filename>ghc/utils/parallel/</filename>.
</para>
</sect2>
<sect2>
<title>Other useful info about running parallel programs</title>
<para>
The &ldquo;garbage-collection statistics&rdquo; RTS options can be useful for
seeing what parallel programs are doing. If you do either
<option>+RTS -Sstderr</option><indexterm><primary>-Sstderr RTS option</primary></indexterm> or <option>+RTS -sstderr</option>, then
you'll get mutator, garbage-collection, etc., times on standard
error. The standard error of all PE's other than the `main thread'
appears in <filename>/tmp/pvml.nnn</filename>, courtesy of PVM.
</para>
<para>
Whether doing <option>+RTS -Sstderr</option> or not, a handy way to watch
what's happening overall is: <command>tail -f /tmp/pvml.nnn</command>.
</para>
</sect2>
<sect2 id="parallel-rts-opts">
<title>RTS options for Parallel Haskell
</title>
<para>
<indexterm><primary>RTS options, parallel</primary></indexterm>
<indexterm><primary>parallel Haskell&mdash;RTS options</primary></indexterm>
</para>
<para>
Besides the usual runtime system (RTS) options
(<xref linkend="runtime-control"/>), there are a few options particularly
for parallel execution.
</para>
<para>
<variablelist>
<varlistentry>
<term><option>-qp&lt;N&gt;</option>:</term>