multi-thread.html 16.3 KB
Newer Older
sof's avatar
sof committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
    <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
<title>The GHC Commentary - Supporting multi-threaded interoperation</title>
</head>
<body>
<h1>The GHC Commentary - Supporting multi-threaded interoperation</h1>
<em>
<p>
Authors: sof@galois.com, simonmar@microsoft.com<br>
Date:    April 2002
</p>
</em>
<p>
This document presents the implementation of an extension to
Concurrent Haskell that provides two enhancements:
</p>
<ul>
<li>A Concurrent Haskell thread may call an external (e.g., C)
function in a manner that's transparent to the execution/evaluation of
other Haskell threads. Section <a href="#callout">Calling out"</a> covers this.
</li>
<li>
OS threads may safely call Haskell functions concurrently. Section
<a href="#callin">"Calling in"</a> covers this.
</li>
</ul>

<!---- ***************************************  ----->
31
<h2 id="callout">The problem: foreign calls that block</h2>
sof's avatar
sof committed
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
<p>
When a Concurrent Haskell(CH) thread calls a 'foreign import'ed
function, the runtime system(RTS) has to handle this in a manner
transparent to other CH threads. That is, they shouldn't be blocked
from making progress while the CH thread executes the external
call. Presently, all threads will block.
</p>
<p>
Clearly, we have to rely on OS-level threads in order to support this
kind of concurrency. The implementation described here defines the 
(abstract) OS threads interface that the RTS assumes. The implementation
currently provides two instances of this interface, one for POSIX
threads (pthreads) and one for the Win32 threads.
</p>

<!---- ***************************************  ----->
<h3>Multi-threading the RTS</h3>

<p>
51
52
53
A simple and efficient way to implement non-blocking foreign calls is like this:
<ul>
<li> Invariant: only one OS thread is allowed to
sof's avatar
sof committed
54
55
execute code inside of the GHC runtime system. [There are alternate
designs, but I won't go into details on their pros and cons here.]
56
57
58
59
60
61
We'll call the OS thread that is currently running Haskell threads
the <em>Current Haskell Worker Thread</em>.
<p>
The Current Haskell Worker Thread repeatedly grabs a Haskell thread, executes it until its
time-slice expires or it blocks on an MVar, then grabs another, and executes
that, and so on.
sof's avatar
sof committed
62
</p>
63
<li>
sof's avatar
sof committed
64
<p>
65
66
67
68
69
When the Current Haskell Worker comes to execute a potentially blocking 'foreign
import', it leaves the RTS and ceases being the Current Haskell Worker, but before doing so it makes certain that
another OS worker thread is available to become the Current Haskell Worker.
Consequently, even if the external call blocks, the new Current Haskell Worker
continues execution of the other Concurrent Haskell threads.
sof's avatar
sof committed
70
71
72
73
When the external call eventually completes, the Concurrent Haskell
thread that made the call is passed the result and made runnable
again.
</p>
74
75
76
77
78
79
80
81
<p>
<li>
A pool of OS threads are constantly trying to become the Current Haskell Worker.
Only one succeeds at any moment.   If the pool becomes empty, the RTS creates more workers.
<p><li>
The OS worker threads are regarded as interchangeable.  A given Haskell thread
may, during its lifetime, be executed entirely by one OS worker thread, or by more than one.
There's just no way to tell.
sof's avatar
sof committed
82

83
84
85
86
<p><li>If a foreign program wants to call a Haskell function, there is always a thread switch involved.
The foreign program uses thread-safe mechanisms to create a Haskell thread and make it runnable; and
the current Haskell Worker Thread exectutes it. See Section <a href="#callin">Calling in</a>.
</ul>
sof's avatar
sof committed
87
<p>
88
89
The rest of this section describes the mechanics of implementing all
this. There's two parts to it, one that describes how a native (OS) thread
sof's avatar
sof committed
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
leaves the RTS to service the external call, the other how the same
thread handles returning the result of the external call back to the
Haskell thread.
</p>

<!---- ***************************************  ----->
<h3>Making the external call</h3>

<p>
Presently, GHC handles 'safe' C calls by effectively emitting the
following code sequence:
</p>  

<pre>
    ...save thread state...
    t = suspendThread();
    r = foo(arg1,...,argn);
    resumeThread(t);
    ...restore thread state...
    return r;
</pre>

<p>
After having squirreled away the state of a Haskell thread,
<tt>Schedule.c:suspendThread()</tt> is called which puts the current
thread on a list [<tt>Schedule.c:suspended_ccalling_threads</tt>]
containing threads that are currently blocked waiting for external calls
to complete (this is done for the purposes of finding roots when
garbage collecting).
</p>

<p>
In addition to putting the Haskell thread on
<tt>suspended_ccalling_threads</tt>, <tt>suspendThread()</tt> now also
does the following:
</p>
<ul>
<li>Instructs the <em>Task Manager</em> to make sure that there's a
another native thread waiting in the wings to take over the execution
of Haskell threads. This might entail creating a new
<em>worker thread</em> or re-using one that's currently waiting for
more work to do. The <a href="#taskman">Task Manager</a> section
presents the functionality provided by this subsystem.
</li>

<li>Releases its capability to execute within the RTS. By doing
so, another worker thread will become unblocked and start executing
code within the RTS. See the <a href="#capability">Capability</a>
section for details.
</li>

<li><tt>suspendThread()</tt> returns a token which is used to
identify the Haskell thread that was added to
<tt>suspended_ccalling_threads</tt>. This is done so that once the
external call has completed, we know what Haskell thread to pull off
the <tt>suspended_ccalling_threads</tt> list.
</li>
</ul>

<p>
Upon return from <tt>suspendThread()</tt>, the OS thread is free of
its RTS executing responsibility, and can now invoke the external
call. Meanwhile, the other worker thread that have now gained access
to the RTS will continue executing Concurrent Haskell code. Concurrent
'stuff' is happening!
</p>

<!---- ***************************************  ----->
<h3>Returning the external result</h3>

<p>
When the native thread eventually returns from the external call,
the result needs to be communicated back to the Haskell thread that
issued the external call. The following steps takes care of this:
</p>

<ul>
<li>The returning OS thread calls <tt>Schedule.c:resumeThread()</tt>,
passing along the token referring to the Haskell thread that made the
call we're returning from.
</li>

<li>
The OS thread then tries to grab hold of a <em>returning worker
capability</em>, via <tt>Capability.c:grabReturnCapability()</tt>.
Until granted, the thread blocks waiting for RTS permissions. Clearly we
don't want the thread to be blocked longer than it has to, so whenever
a thread that is executing within the RTS enters the Scheduler (which
is quite often, e.g., when a Haskell thread context switch is made),
it checks to see whether it can give up its RTS capability to a
returning worker, which is done by calling
<tt>Capability.c:yieldToReturningWorker()</tt>.
</li>

<li>
If a returning worker is waiting (the code in <tt>Capability.c</tt>
keeps a counter of the number of returning workers that are currently
blocked waiting), it is woken up and the given the RTS execution
priviledges/capabilities of the worker thread that gave up its.
</li>

<li>
The thread that gave up its capability then tries to re-acquire
the capability to execute RTS code; this is done by calling
<tt>Capability.c:waitForWorkCapability()</tt>.
</li>

<li>
The returning worker that was woken up will continue execution in
<tt>resumeThread()</tt>, removing its associated Haskell thread
from the <tt>suspended_ccalling_threads</tt> list and start evaluating
that thread, passing it the result of the external call.
</li>
</ul>

<!---- ***************************************  ----->
<h3 id="rts-exec">RTS execution</h3>

<p>
If a worker thread inside the RTS runs out of runnable Haskell
threads, it goes to sleep waiting for the external calls to complete.
It does this by calling <tt>waitForWorkCapability</tt>
</p>

<p>
The availability of new runnable Haskell threads is signalled when:
</p>

<ul>
<li>When an external call is set up in <tt>suspendThread()</tt>.</li>
<li>When a new Haskell thread is created (e.g., whenever
<tt>Concurrent.forkIO</tt> is called from within Haskell); this is
signalled in <tt>Schedule.c:scheduleThread_()</tt>.
</li>
<li>Whenever a Haskell thread is removed from a 'blocking queue'
attached to an MVar (only?).
</li>
</ul>

<!---- ***************************************  ----->
<h2 id="callin">Calling in</h2>

Providing robust support for having multiple OS threads calling into
Haskell is not as involved as its dual. 

<ul>
<li>The OS thread issues the call to a Haskell function by going via
the <em>Rts API</em> (as specificed in <tt>RtsAPI.h</tt>). 
<li>Making the function application requires the construction of a
closure on the heap. This is done in a thread-safe manner by having
the OS thread lock a designated block of memory (the 'Rts API' block,
which is part of the GC's root set) for the short period of time it
takes to construct the application.
<li>The OS thread then creates a new Haskell thread to execute the
function application, which (eventually) boils down to calling
<tt>Schedule.c:createThread()</tt> 
<li>
Evaluation is kicked off by calling <tt>Schedule.c:scheduleExtThread()</tt>,
which asks the Task Manager to possibly create a new worker (OS)
thread to execute the Haskell thread.
<li>
After the OS thread has done this, it blocks waiting for the 
Haskell thread to complete the evaluation of the Haskell function.
<p>
The reason why a separate worker thread is made to evaluate the Haskell
function and not the OS thread that made the call-in via the
Rts API, is that we want that OS thread to return as soon as possible.
We wouldn't be able to guarantee that if the OS thread entered the 
RTS to (initially) just execute its function application, as the
Scheduler may side-track it and also ask it to evaluate other Haskell threads.
</li>
</ul>

sof's avatar
sof committed
263
264
265
266
267
268
269
270
271
<p>
<strong>Note:</strong> As of 20020413, the implementation of the RTS API
only serializes access to the allocator between multiple OS threads wanting
to call into Haskell (via the RTS API.) It does not coordinate this access
to the allocator with that of the OS worker thread that's currently executing
within the RTS. This weakness/bug is scheduled to be tackled as part of an
overhaul/reworking of the RTS API itself.


sof's avatar
sof committed
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
<!---- ***************************************  ----->
<h2>Subsystems introduced/modified</h2>

<p>
These threads extensions affect the Scheduler portions of the runtime
system. To make it more manageable to work with, the changes
introduced a couple of new RTS 'sub-systems'. This section presents
the functionality and API of these sub-systems.
</p>

<!---- ***************************************  ----->
<h3 id="#capability">Capabilities</h3>

<p>
A Capability represent the token required to execute STG code,
and all the state an OS thread/task needs to run Haskell code:
its STG registers, a pointer to its TSO, a nursery etc. During
STG execution, a pointer to the capabilitity is kept in a
register (BaseReg).
</p>
<p>
Only in an SMP build will there be multiple capabilities, for
the threaded RTS and other non-threaded builds, there is only
one global capability, namely <tt>MainCapability</tt>.

<p>
The Capability API is as follows:
<pre>
/* Capability.h */
extern void initCapabilities(void);

extern void grabReturnCapability(Mutex* pMutex, Capability** pCap);
extern void waitForWorkCapability(Mutex* pMutex, Capability** pCap, rtsBool runnable);
extern void releaseCapability(Capability* cap);

extern void yieldToReturningWorker(Mutex* pMutex, Capability* cap);

extern void grabCapability(Capability** cap);
</pre>

<ul>
<li><tt>initCapabilities()</tt> initialises the subsystem.

<li><tt>grabReturnCapability()</tt> is called by worker threads
returning from an external call. It blocks them waiting to gain
permissions to do so.

<li><tt>waitForWorkCapability()</tt> is called by worker threads
already inside the RTS, but without any work to do. It blocks them
waiting for there to new work to become available.

<li><tt>releaseCapability()</tt> hands back a capability. If a
'returning worker' is waiting, it is signalled that a capability
has become available. If not, <tt>releaseCapability()</tt> tries
to signal worker threads that are blocked waiting inside
<tt>waitForWorkCapability()</tt> that new work might now be
available.

<li><tt>yieldToReturningWorker()</tt> is called by the worker thread
that's currently inside the Scheduler. It checks whether there are other
worker threads waiting to return from making an external call. If so,
they're given preference and a capability is transferred between worker
threads. One of the waiting 'returning worker' threads is signalled and made
runnable, with the other, yielding, worker blocking to re-acquire
a capability.
</ul>

<p>
The condition variables used to implement the synchronisation between
worker consumers and providers are local to the Capability
implementation. See source for details and comments.
</p>

<!---- ***************************************  ----->
<h3 id="taskman">The Task Manager</h3>

<p>
The Task Manager API is responsible for managing the creation of
OS worker RTS threads. When a Haskell thread wants to make an
external call, the Task Manager is asked to possibly create a
new worker thread to take over the RTS-executing capability of
the worker thread that's exiting the RTS to execute the external call.

<p>
The Capability subsystem keeps track of idle worker threads, so
making an informed decision about whether or not to create a new OS
worker thread is easy work for the task manager. The Task manager
provides the following API:
</p>

<pre>
/* Task.h */
extern void startTaskManager ( nat maxTasks, void (*taskStart)(void) );
extern void stopTaskManager ( void );

extern void startTask ( void (*taskStart)(void) );
</pre>

<ul>
<li><tt>startTaskManager()</tt> and <tt>stopTaskManager()</tt> starts
up and shuts down the subsystem. When starting up, you have the option
to limit the overall number of worker threads that can be
created. An unbounded (modulo OS thread constraints) number of threads
is created if you pass '0'.
<li><tt>startTask()</tt> is called when a worker thread calls
<tt>suspendThread()</tt> to service an external call, asking another
worker thread to take over its RTS-executing capability. It is also
called when an external OS thread invokes a Haskell function via the
<em>Rts API</em>.
</ul>

<!---- ***************************************  ----->
<h3>Native threads API</h3>

To hide OS details, the following API is used by the task manager and
the scheduler to interact with an OS' threads API:

<pre>
/* OSThreads.h */
typedef <em>..OS specific..</em> Mutex;
extern void initMutex    ( Mutex* pMut );
extern void grabMutex    ( Mutex* pMut );
extern void releaseMutex ( Mutex* pMut );
  
typedef <em>..OS specific..</em> Condition;
extern void    initCondition      ( Condition* pCond );
extern void    closeCondition     ( Condition* pCond );
extern rtsBool broadcastCondition ( Condition* pCond );
extern rtsBool signalCondition    ( Condition* pCond );
extern rtsBool waitCondition      ( Condition* pCond, 
				    Mutex* pMut );

extern OSThreadId osThreadId      ( void );
extern void shutdownThread        ( void );
extern void yieldThread           ( void );
extern int  createOSThread        ( OSThreadId* tid,
				    void (*startProc)(void) );
</pre>



<!---- ***************************************  ----->
<h2>User-level interface</h2>

To signal that you want an external call to be serviced by a separate
OS thread, you have to add the attribute <tt>threadsafe</tt> to
a foreign import declaration, i.e.,

<pre>
foreign import "bigComp" threadsafe largeComputation :: Int -> IO ()
</pre>

<p>
The distinction between 'safe' and thread-safe C calls is made
so that we may call external functions that aren't re-entrant but may
cause a GC to occur.
<p>
The <tt>threadsafe</tt> attribute subsumes <tt>safe</tt>.
</p>

<!---- ***************************************  ----->
<h2>Building the GHC RTS</h2>

The multi-threaded extension isn't currently enabled by default. To
have it built, you need to run the <tt>fptools</tt> configure script
with the extra option <tt>--enable-threaded-rts</tt> turned on, and
then proceed to build the compiler as per normal.

<hr>
<small>
<!-- hhmts start --> Last modified: Wed Apr 10 14:21:57 Pacific Daylight Time 2002 <!-- hhmts end -->
</small>
</body> </html>