duncan · 47e4b9a1
--- a/simd.md
+++ b/simd.md
@@ -478,7 +478,35 @@ The native-sized vector types are distinct types from the explicit-sized vector
 ## Vector operations


-The following operations on vectors will be supported. They will need to be implemented at the Haskell/core primop layer, Cmm MachOp layer and optional support in the code generators.
+The following operations on vectors will be supported. They will need to be implemented at the Haskell/core primop layer, Cmm `MachOp` layer and optional support in the code generators.
+
+
+In the following, `<t>` ranges over `Int<w>`, `Word<w>`, `Float`, `Double`.
+
+
+Loading and storing vectors in arrays, ByteArray\# and raw Addr\#
+
+```wiki
+readInt<w>Vec<m>Array#  :: MutableByteArray# d -> Int# -> Int#    -> State# d -> State# d
+readWord<w>Vec<m>Array# :: MutableByteArray# d -> Int# -> Word#   -> State# d -> State# d
+readFloatVec<m>Array#   :: MutableByteArray# d -> Int# -> Float#  -> State# d -> State# d
+readDoubleVec<m>Array#  :: MutableByteArray# d -> Int# -> Double# -> State# d -> State# d
+
+writeInt<w>Vec<m>Array#  :: MutableByteArray# d -> Int# -> Int#    -> State# d -> State# d
+writeWord<w>Vec<m>Array# :: MutableByteArray# d -> Int# -> Word#   -> State# d -> State# d
+writeFloatVec<m>Array#   :: MutableByteArray# d -> Int# -> Float#  -> State# d -> State# d
+writeDoubleVec<m>Array#  :: MutableByteArray# d -> Int# -> Double# -> State# d -> State# d
+
+readInt<w>Vec<m>OffAddr#  :: Addr# -> Int# -> Int#    -> State# d -> State# d
+readWord<w>Vec<m>OffAddr# :: Addr# -> Int# -> Word#   -> State# d -> State# d
+readFloatVec<m>OffAddr#   :: Addr# -> Int# -> Float#  -> State# d -> State# d
+readDoubleVec<m>OffAddr#  :: Addr# -> Int# -> Double# -> State# d -> State# d
+
+writeInt<w>Vec<m>OffAddr#  :: Addr# -> Int# -> Int#    -> State# d -> State# d
+writeWord<w>Vec<m>OffAddr# :: Addr# -> Int# -> Word#   -> State# d -> State# d
+writeFloatVec<m>OffAddr#   :: Addr# -> Int# -> Float#  -> State# d -> State# d
+writeDoubleVec<m>OffAddr#  :: Addr# -> Int# -> Double# -> State# d -> State# d
+```


 Extracting and inserting vector elements:
@@ -486,8 +514,8 @@ Extracting and inserting vector elements:
 ```wiki
 extractInt<w>Vec<m>#   :: Int<w>Vec<m>#  -> Int# -> Int#
 extractWord<w>Vec<m>#  :: Word<w>Vec<m># -> Int# -> Word#
-extractFloatVec#       :: FloatVec<m>#   -> Int# -> Float#
-extractDoubleVec#      :: DoubleVec<m>#  -> Int# -> Double#
+extractFloatVec<m>#    :: FloatVec<m>#   -> Int# -> Float#
+extractDoubleVec<m>#   :: DoubleVec<m>#  -> Int# -> Double#
 ```

 ```wiki
@@ -498,10 +526,20 @@ insertDoubleVec#      :: DoubleVec<m>#  -> Int# -> Double# -> DoubleVec<m>#
 ```


+Duplicating a scalar to a vector:
+
+```wiki
+replicateToInt<w>Vec<m>#  :: Int<w>Vec<m>#  -> Int#    -> Int<w>Vec<m>#
+replicateToWord<w>Vec<m># :: Word<w>Vec<m># -> Word#   -> Word<w>Vec<m>#
+replicateToFloatVec#      :: FloatVec<m>#   -> Float#  -> FloatVec<m>#
+replicateToDoubleVec#     :: DoubleVec<m>#  -> Double# -> DoubleVec<m>#
+```
+
+
 Vector shuffle:

 ```wiki
-shuffleInt<w>Vec<m>ToVec<m'>  :: Int<w>Vec<m>#  -> Int32Vec<m'>#    -> Int<w>Vec<m'>#
+shuffle<t>Vec<m>ToVec<m'> :: <t>Vec<m># -> Int32Vec<m'># -> <t>Vec<m'>#
 ```


@@ -513,9 +551,6 @@ packInt<w>Vec4#   :: (# Int#, Int#, Int#, Int# #) -> Int<w>Vec4#
 ```


-In the following, `<t>` ranges over `Int<w>`, `Word<w>`, `Float`, `Double`.
-
-
 Arithmetic operations:

 ```wiki
@@ -550,24 +585,59 @@ cmp<eq,ne,gt,gt,lt,le>Word<w>Vec<m># :: Word<w>Vec<m># -> Word<w>Vec<m># -> Word

 Note that LLVM does not yet support the comparison operations.

-TODO

- conversion sign/width operations, e.g. Word \<-\> Int, Word8 \<-\> Word16 etc.
- conversion fp operations, e.g. Float \<-\> Int
+Integer width narrow/widen operations:
+
+```wiki
+narrowInt<w>To<w'>Vec<m>#  :: Int<w>Vec<m># -> Int<w'>Vec<m>#     -- for w' < w
+narrowWord<w>To<w'>Vec<m># :: Word<w>Vec<m># -> Word<w'>Vec<m>#   -- for w' < w
+
+widenInt<w>To<w'>Vec<m>#   :: Int<w>Vec<m># -> Int<w'>Vec<m>#     -- for w' > w
+widenWord<w>To<w'>Vec<m>#  :: Word<w>Vec<m># -> Word<w'>Vec<m>#   -- for w' > w
+```


-Should also consider:
+Note: LLVM calls these truncate and extend (signed extend or unsigned extend)
+
+
+Floating point conversion:
+
+```wiki
+narrowDoubleToFloatVec<m>#  :: DoubleVec<m># -> FloatVec<m>#
+widenFloatToDoubleVec<m>#   :: FloatVec<m>#  -> DoubleVec<m>#
+
+roundFloatToInt32Vec<m>     :: FloatVec<m>#  -> Int32Vec<m>#
+roundFloatToInt64Vec<m>     :: FloatVec<m>#  -> Int64Vec<m>#
+roundDoubleToInt32Vec<m>    :: DoubleVec<m># -> Int32Vec<m>#
+roundDoubleToInt64Vec<m>    :: DoubleVec<m># -> Int64Vec<m>#
+
+truncateFloatToInt32Vec<m>  :: FloatVec<m>#  -> Int32Vec<m>#
+truncateFloatToInt64Vec<m>  :: FloatVec<m>#  -> Int64Vec<m>#
+truncateDoubleToInt32Vec<m> :: DoubleVec<m># -> Int32Vec<m>#
+truncateDoubleToInt64Vec<m> :: DoubleVec<m># -> Int64Vec<m>#
+
+promoteInt32ToFloatVec<m>   :: Int32Vec<m># -> FloatVec<m>#
+promoteInt64ToFloatVec<m>   :: Int64Vec<m># -> FloatVec<m>#
+promoteInt32ToDoubleVec<m>  :: Int32Vec<m># -> DoubleVec<m>#
+promoteInt64ToDoubleVec<m>  :: Int64Vec<m># -> DoubleVec<m>#
+```
+
+TODO Should consider:

 - vector constants, at least at Cmm level
 - replicating a scalar to a vector
 - FMA: fused multiply add, this is supported by NEON and AVX however software fallback may not be possible with the same precision. Tricky.
- AVX also suppports a bunch of interesting things:
+- SSE/AVX also suppports a bunch of interesting things:

+  - add/sub/mul/div of vector by a scalar
+  - reciprocal, square root, reciprocal of square root
  - permute, shuffle, "blend", masked moves.
+  - abs
  - min, max within a vector
  - average
  - horizontal add/sub
  - shift whole vector left/right by n bytes
+  - and not logical op
  - gather (but not scatter) of 32, 64bit int and fp from memory (base + vector of offsets)

 ### Int/Word size wrinkle