chak · 373aae79
--- a/data-parallel/vect-pragma.md
+++ b/data-parallel/vect-pragma.md
 # The VECTORISE pragma


-The vectoriser needs to know about all types and functions whose vectorised variants are directly implemented by the DPH library (instead of generated by the vectoriser), and it needs to know what the vectorised versions are.  That is the purpose of the `VECTORISE` pragma (which comes in in number of flavours).
+The vectoriser needs to know about all types and functions whose vectorised variants are directly implemented in the DPH library (instead of generated by the vectoriser), and it needs to know what the vectorised versions are. That is the purpose of the `VECTORISE` pragma (which comes in in number of flavours).
+
+## Scalar versus parallel types and values
+
+
+In addition to tracking the vectorised versions of types and values, the vectoriser needs to keep track of whether the computation of values and functions involves data parallelism, and also, whether types embed parallel arrays. Whether or not a type or value is associated with a vectorised version is *not* sufficient to decide on the presence of embedded parallelism. In particular, every higher-order function must be vectorised as its execution may involve parallel computations if any of its functional arguments involve parallel computation. Nevertheless, the higher-order function itself may be purely scalar. An example is function application itself:
+
+```wiki
+($) :: (a -> b) -> a -> b
+f $ a = f a
+```
+
+
+It clearly, does not directly include any data parallelism, but `mapP f $ arr` invariably does.

 ## The basic VECTORISE pragma for values

@@ -9,7 +22,7 @@ The vectoriser needs to know about all types and functions whose vectorised vari
 Given a function `f`, the vectoriser generates a vectorised version `f_v`, which comprises the original, scalar version of the function and a second version lifted into array space.  The lifted version operates on arrays of inputs and produces arrays of results in one parallel computation.  The original function name is, then, rebound to use the scalar version referred to by `f_v`.  This differs from the original in that it uses vectorised versions for any embedded parallel array computations.


-We have got two exceptions to this rule. Firstly, if the body of a function `f` is scalar —i.e., it does not involve any parallel array computations— then we leave it as is and omit the generation of `f_v`. Whether a function is scalar is determined by the rules described in the **Vectorisation Avoidance** paper.
+We have got two exceptions to this rule. Firstly, if the body of a function `f` is scalar —i.e., it does not involve any parallel array computations and has scalar argument and result types— then we leave it as is and omit the generation of `f_v`. Whether a function is scalar is determined by the rules described in the **Vectorisation Avoidance** paper.


 Secondly, if a variable `f` is accompanied by a pragma of the form
@@ -24,6 +37,8 @@ then the vectoriser defines `f_v = e` and refrains from rebinding `f`.  This imp

 This pragma can also be used for imported functions `f`.  In this case, `f_v` and a suitable vectorisation mapping of `f` to `f_v` is exported implicitly — just like `RULES` applied to imported identifiers.  By vectorising imported functions, we can vectorise functions of modules that have not been compiled with `-fvectorise`.  This is crucial to using the standard `Prelude` in vectorised code.

+*Parallelism:* A vectorised value is marked as parallel if its code includes a parallel value or if it includes any parallel types. The detailed rules are in the **Vectorisation Avoidance** paper.
+
 **IMPLEMENTATION RESTRICTION:** Currently the right-hand side of the equation —i.e., `e`— may only be a simple identifier **and** it must be at the correct type instance.  More precisely, the Core type of the right-hand side must be identical to the vectorised version of `t`.

 ## The NOVECTORISE pragma for values
@@ -41,6 +56,8 @@ then it is ignored by the vectoriser — i.e., no function `f_v` is generated an

 This pragma can only be used for bindings in the current module (exactly like an `INLINE` pragma). The pragma must be used on all bindings forming a recursive group if it is used on any of the bindings in a group.

+*Parallelism:*`f` will not be marked as parallel.
+
 **Caveat:** If `f`'s definition contains bindings that are being floated to the toplevel, those bindings may still be vectorised. (**TODO** We might want to ensure that we never float anything out of (at least, those) bindings before the vectoriser is invoked.)

 ## The VECTORISE SCALAR pragma for functions
@@ -68,6 +85,8 @@ The type constructor `T` must be in scope, but it may be imported.  `PData` and

 Examples are the vectorisation of types, such as `Maybe` and `[]`, defined in the `Prelude`.

+*Parallelism:*`T` is being marked as parallel by the vectoriser if `T`'s definition includes any type constructor that is parallel.
+
 ### With right-hand side


@@ -86,6 +105,8 @@ The type constructor `T` must be in scope, but it may be imported.  `PData` and

 An example is the vectorisation of parallel arrays, where `[::]` is replaced by `PArray` during vectorisation, but the vectoriser never looks at the representation of `[::]`.

+*Parallelism:* The type constructor `T` is marked as parallel.
+
 ## The VECTORISE SCALAR pragma for type constructors


@@ -109,6 +130,8 @@ The type constructor `T` must be in scope, but it may be imported.  `PData` and

 An example is the handling of `Bool`, which is scalar and represents itself in vectorised code, but we want to use the custom instances of 'PData' and 'PRepr' defined in the DPH libraries.

+*Parallelism:* The type `T` is not marked as parallel.
+
 ### With right-hand side


@@ -127,6 +150,8 @@ The type constructor `T` must be in scope, but it may be imported.  The `PData`

 An example is the handling of `(->)`, which the vectoriser maps to `(:->)`, but it never looks at the implementation of `(->)` and allows its use in encapsulated scalar code.

+*Parallelism:* The type `T` is not marked as parallel.
+
 ## The NOVECTORISE pragma for types


@@ -142,6 +167,8 @@ then it is ignored by the vectoriser — i.e., no type `T_v`  and no class insta

 This pragma can only be used for definitions in the current module.

+*Parallelism:* The type `T` is not marked as parallel.
+
 **TODO**

 - Not implemented yet.
@@ -159,14 +186,16 @@ For a type class `C`, the pragma
 indicates that the class `C` should be automatically vectorised, even if it is imported.  This is the default for all classes declared in the current module.


-The class `C` must be in scope, but it may be imported.  'PData' and 'PRepr' instances are generally not used for type classes and their dictionary representations.
+The class `C` must be in scope, but it may be imported.  'PData' and 'PRepr' instances are generally not used for type classes and their dictionary representations. This pragma is only needed for classes that are declared in non-vectorised modules and if we want to declare class instances in vectorised code.


 An example is the handling of `Eq`.

+*Parallelism:* The class tycon of `C` is marked as parallel if the class methods include any type constructors marked as parallel.
+
 **TODO**

- We want something like `{-# VECTORISE class C = C' #-}` (but what about the instances?)
+- We want something like `{-# VECTORISE class C = C' #-}` (but what about the instances?) Do we still have a need for that???

 ## The VECTORISE SCALAR pragma for class instances