GHC issueshttps://gitlab.haskell.org/ghc/ghc/-/issues2023-05-03T13:40:37Zhttps://gitlab.haskell.org/ghc/ghc/-/issues/22618Add SIMD128 support to the wasm backend2023-05-03T13:40:37ZCheng ShaoAdd SIMD128 support to the wasm backendMost WebAssembly runtimes as of today [supports](https://github.com/WebAssembly/spec/blob/master/proposals/simd/SIMD.md) 128bits fixed-width SIMD, and it's also supported by LLVM. So it's possible for us to support it in the wasm backend...Most WebAssembly runtimes as of today [supports](https://github.com/WebAssembly/spec/blob/master/proposals/simd/SIMD.md) 128bits fixed-width SIMD, and it's also supported by LLVM. So it's possible for us to support it in the wasm backend, a good portion of GHC SIMD primops can be lowered to wasm SIMD128 opcodes directly.
I don't see any particular difficulty in this, but it does take some time to implement. And it's a rather low priority feature request, should be postponed given:
- ~~There still exist issue with `-dcmm-lint` and SIMD in GHC (#22297).~~
- Update: the related issue has been fixed
- ~~JavaScriptCore still [doesn't](https://bugs.webkit.org/show_bug.cgi?id=222382) support it yet, and WebKit-based browsers do have a market share we'd like to not ignore.~~
- Update: it's supported since safari 16.4https://gitlab.haskell.org/ghc/ghc/-/issues/22582Better error for unsupported vector operations2022-12-09T18:46:55ZKrzysztof GogolewskiBetter error for unsupported vector operationsGHCi does not support vector operations.
This error message is fine:
```
ghci> :set -XMagicHash -XUnboxedTuples
ghci> :m GHC.Exts
ghci> x = unpackDoubleX2#
sorry! (unimplemented feature or known bug)
GHC version 9.5.20221206:
SIMD v...GHCi does not support vector operations.
This error message is fine:
```
ghci> :set -XMagicHash -XUnboxedTuples
ghci> :m GHC.Exts
ghci> x = unpackDoubleX2#
sorry! (unimplemented feature or known bug)
GHC version 9.5.20221206:
SIMD vector operations are not available in GHCi
```
But those two could be improved:
```
ghci> x = packDoubleX2#
ghc: ^^ Could not load 'ghczmprim_GHCziPrimopWrappers_packDoubleX2zh_closure', dependency unresolved. See top entry above.
GHC.ByteCode.Linker.lookupCE(primop)
During interactive linking, GHCi couldn't find the following symbol:
ghczmprim_GHCziPrimopWrappers_packDoubleX2zh_closure
This may be due to you not asking GHCi to load extra object files,
archives or DLLs needed by your current session. Restart GHCi, specifying
the missing library using the -L/path/to/object/dir and -lmissinglibname
flags, or simply by naming the relevant files on the GHCi command line.
Alternatively, this link failure might indicate a bug in GHCi.
If you suspect the latter, please report this as a GHC bug:
https://www.haskell.org/ghc/reportabug
ghci> f :: DoubleX2# -> DoubleX2#; f x = x
*** Exception: return_unlifted: vector
CallStack (from HasCallStack):
error, called at compiler/GHC/ByteCode/Asm.hs:532:23 in ghc:GHC.ByteCode.Asm
```https://gitlab.haskell.org/ghc/ghc/-/issues/17913Add primops for MMX parallel subtract and parallel add intrinsics2020-03-15T15:48:27ZJohn KyAdd primops for MMX parallel subtract and parallel add intrinsicsParallel subtract:
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=MMX&text=_m_psub&expand=5805
Parallel add:
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=MMX&text=_m_padd&expand=5805
These to...Parallel subtract:
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=MMX&text=_m_psub&expand=5805
Parallel add:
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=MMX&text=_m_padd&expand=5805
These to byte-wise, short-wise, word-wise subtraction and addition in a word64. There is also a saturated version of each which caps at the maximum and minimum value to prevent overflow/underflow.
These instructions are significantly faster than broadword programming, which in turn is faster than word at a time processing.
On systems that don't have these instructions the behaviour can be emulated with broadword programming.
Details to follow.https://gitlab.haskell.org/ghc/ghc/-/issues/15876Function versioning instead of compilation flags...2019-07-07T18:02:36ZMichalGajdaFunction versioning instead of compilation flags...Wanting to take advantage of SIMD, we need to compile a different implementation of certain core libraries functions (like `ByteString` c_memchr to make it 16x faster).
This would require recompiling most of the libraries for new flags....Wanting to take advantage of SIMD, we need to compile a different implementation of certain core libraries functions (like `ByteString` c_memchr to make it 16x faster).
This would require recompiling most of the libraries for new flags.
Instead, it would be much simpler to add function versioning a la GCC: https://lwn.net/Articles/691932/
This would allow us to write code like this:
{-\# target(avx512) \#-}
c_memchr = ...SIMD code...
{-\# !target(avx512) \#-}
c_memchr = ...current code...
We currently use special libraries for these kind of speedups, but it would be much better to use SIMD across few key functions in all libraries to get 16x speedups across the board (`c_memchr` for parsing.)
Ideally we could also use it to remove some of _flavoring_ in the future.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ---------------------------- |
| Version | 8.6.2 |
| Type | FeatureRequest |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler (Linking) |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | abhir00p, marlowsd@gmail.com |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Function versioning instead of compilation flags...","status":"New","operating_system":"","component":"Compiler (Linking)","related":[],"milestone":"8.6.3","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"8.6.2","keywords":["flavors,","simd,","speed"],"differentials":[],"test_case":"","architecture":"","cc":["abhir00p","marlowsd@gmail.com"],"type":"FeatureRequest","description":"Wanting to take advantage of SIMD, we need to compile a different implementation of certain core libraries functions (like `ByteString` c_memchr to make it 16x faster).\r\n\r\nThis would require recompiling most of the libraries for new flags.\r\n\r\nInstead, it would be much simpler to add function versioning a la GCC: https://lwn.net/Articles/691932/\r\n\r\nThis would allow us to write code like this:\r\n{-# target(avx512) #-}\r\nc_memchr = ...SIMD code...\r\n{-# !target(avx512) #-}\r\nc_memchr = ...current code...\r\n\r\nWe currently use special libraries for these kind of speedups, but it would be much better to use SIMD across few key functions in all libraries to get 16x speedups across the board (`c_memchr` for parsing.)\r\n\r\nIdeally we could also use it to remove some of _flavoring_ in the future.\r\n","type_of_failure":"OtherFailure","blocking":[]} -->8.6.3https://gitlab.haskell.org/ghc/ghc/-/issues/15251Add support for _mm_shuffle_pi8 intrinsic2020-01-23T19:19:12ZJohn KyAdd support for _mm_shuffle_pi8 intrinsic`!#c
__m64 _mm_shuffle_pi8 (__m64 a, __m64 b)
`
See:
https://software.intel.com/sites/landingpage/IntrinsicsGuide/\#text=_mm_shuffle_pi8
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ----------...`!#c
__m64 _mm_shuffle_pi8 (__m64 a, __m64 b)
`
See:
https://software.intel.com/sites/landingpage/IntrinsicsGuide/\#text=_mm_shuffle_pi8
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 8.4.3 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Add support for _mm_shuffle_pi8 intrinsic","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"8.6.1","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"8.4.3","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"{{{!#c\r\n__m64 _mm_shuffle_pi8 (__m64 a, __m64 b)\r\n}}}\r\n\r\nSee:\r\n\r\nhttps://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_shuffle_pi8","type_of_failure":"OtherFailure","blocking":[]} -->https://gitlab.haskell.org/ghc/ghc/-/issues/15250Add support for _mm512_shuffle_epi8 intrinsic2020-01-23T19:19:12ZJohn KyAdd support for _mm512_shuffle_epi8 intrinsic```c
__m512i _mm512_shuffle_epi8 (__m512i a, __m512i b)
```
See:
https://software.intel.com/sites/landingpage/IntrinsicsGuide/\#expand=765,3914,2929,4754,4757&text=_mm512_shuffle_epi8
<details><summary>Trac metadata</summary>
| Trac ...```c
__m512i _mm512_shuffle_epi8 (__m512i a, __m512i b)
```
See:
https://software.intel.com/sites/landingpage/IntrinsicsGuide/\#expand=765,3914,2929,4754,4757&text=_mm512_shuffle_epi8
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 8.4.3 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Add support for _mm512_shuffle_epi8 intrinsic","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"8.6.1","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"8.4.3","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"{{{#!c\r\n__m512i _mm512_shuffle_epi8 (__m512i a, __m512i b)\r\n}}}\r\n\r\nSee:\r\n\r\nhttps://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=765,3914,2929,4754,4757&text=_mm512_shuffle_epi8","type_of_failure":"OtherFailure","blocking":[]} -->https://gitlab.haskell.org/ghc/ghc/-/issues/15249Add support for cmpeq and cmpgt MMX intrinsics2020-01-23T19:19:12ZJohn KyAdd support for cmpeq and cmpgt MMX intrinsicsAdd primop support for the following MMX intrinsics:
```c
__m64 _mm_cmpeq_pi16 (__m64 a, __m64 b)
__m64 _mm_cmpeq_pi32 (__m64 a, __m64 b)
__m64 _mm_cmpeq_pi8 (__m64 a, __m64 b)
__m64 _mm_cmpeq_pi8 (__m64 a, __m64 b)
__m64 _mm_cmpgt_pi16...Add primop support for the following MMX intrinsics:
```c
__m64 _mm_cmpeq_pi16 (__m64 a, __m64 b)
__m64 _mm_cmpeq_pi32 (__m64 a, __m64 b)
__m64 _mm_cmpeq_pi8 (__m64 a, __m64 b)
__m64 _mm_cmpeq_pi8 (__m64 a, __m64 b)
__m64 _mm_cmpgt_pi16 (__m64 a, __m64 b)
__m64 _mm_cmpgt_pi32 (__m64 a, __m64 b)
__m64 _mm_cmpgt_pi8 (__m64 a, __m64 b)
```
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 8.4.3 |
| Type | Task |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Add support for cmpeq and cmpgt MMX intrinsics","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"8.6.1","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"8.4.3","keywords":["primops"],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Task","description":"Add primop support for the following MMX intrinsics:\r\n\r\n{{{#!c\r\n__m64 _mm_cmpeq_pi16 (__m64 a, __m64 b)\r\n__m64 _mm_cmpeq_pi32 (__m64 a, __m64 b)\r\n__m64 _mm_cmpeq_pi8 (__m64 a, __m64 b)\r\n__m64 _mm_cmpeq_pi8 (__m64 a, __m64 b)\r\n__m64 _mm_cmpgt_pi16 (__m64 a, __m64 b)\r\n__m64 _mm_cmpgt_pi32 (__m64 a, __m64 b)\r\n__m64 _mm_cmpgt_pi8 (__m64 a, __m64 b)\r\n}}}\r\n","type_of_failure":"OtherFailure","blocking":[]} -->John KyJohn Kyhttps://gitlab.haskell.org/ghc/ghc/-/issues/13852Can we have more SIMD primops, corresponding to the untapped AVX etc. instruc...2021-10-27T17:43:58ZleftaroundaboutCan we have more SIMD primops, corresponding to the untapped AVX etc. instructions?[GHC.Prim](http://hackage.haskell.org/package/ghc-prim-0.5.0.0/docs/GHC-Prim.html#g:28) contains a good couple of vectorised instructions, which can be [used by libraries](http://hackage.haskell.org/package/primitive-simd-0.1.0.0/docs/Da...[GHC.Prim](http://hackage.haskell.org/package/ghc-prim-0.5.0.0/docs/GHC-Prim.html#g:28) contains a good couple of vectorised instructions, which can be [used by libraries](http://hackage.haskell.org/package/primitive-simd-0.1.0.0/docs/Data-Primitive-SIMD.html) for generating nice fast e.g. sums of floating-point vectors.
However, several instructions that modern processors could vectorise are missing there. In particular, I would like to be able to use the VPSLLVD...VPSRAVD shifting operations, and at some point perhaps VPMAXSQ...VPMINUQ maximum/minimum operations.
It would be great if corresponding primops could be added. Else I would like to know – where is this stuff even defined? [GHC.Prim](http://hackage.haskell.org/package/ghc-prim-0.5.0.0/docs/src/GHC.Prim.html) as such seems to be merely an automatically-generated dummy module, mostly for Haddock.
(On the other hand, I find it also a bit strange that there are primops for [integer division](http://hackage.haskell.org/package/ghc-prim-0.5.0.0/docs/GHC-Prim.html#v:quotInt8X16-35-), which is apparently [not supported by SSE/AVX](https://stackoverflow.com/questions/16822757/sse-integer-division) at all!)
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | --------------- |
| Version | 8.0.1 |
| Type | FeatureRequest |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler (LLVM) |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Can we have more SIMD primops, corresponding to the untapped AVX etc. instructions?","status":"New","operating_system":"","component":"Compiler (LLVM)","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"8.0.1","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"FeatureRequest","description":"[http://hackage.haskell.org/package/ghc-prim-0.5.0.0/docs/GHC-Prim.html#g:28 GHC.Prim] contains a good couple of vectorised instructions, which can be [http://hackage.haskell.org/package/primitive-simd-0.1.0.0/docs/Data-Primitive-SIMD.html used by libraries] for generating nice fast e.g. sums of floating-point vectors.\r\n\r\nHowever, several instructions that modern processors could vectorise are missing there. In particular, I would like to be able to use the VPSLLVD...VPSRAVD shifting operations, and at some point perhaps VPMAXSQ...VPMINUQ maximum/minimum operations.\r\n\r\nIt would be great if corresponding primops could be added. Else I would like to know – where is this stuff even defined? [http://hackage.haskell.org/package/ghc-prim-0.5.0.0/docs/src/GHC.Prim.html GHC.Prim] as such seems to be merely an automatically-generated dummy module, mostly for Haddock.\r\n\r\n(On the other hand, I find it also a bit strange that there are primops for [http://hackage.haskell.org/package/ghc-prim-0.5.0.0/docs/GHC-Prim.html#v:quotInt8X16-35- integer division], which is apparently [https://stackoverflow.com/questions/16822757/sse-integer-division not supported by SSE/AVX] at all!)","type_of_failure":"OtherFailure","blocking":[]} -->https://gitlab.haskell.org/ghc/ghc/-/issues/10648Some 64-vector SIMD primitives are absolutely useless2019-07-07T18:34:53ZmniipSome 64-vector SIMD primitives are absolutely uselessThe primitives `packInt8X64#`, `packWord8X64#`, `unpackInt8X64#`, `unpackWord8X64#` cannot be used because their types include unboxed 64-tuples, but any haskell code using them does not compile due to the 62-tuple limitation.
<details>...The primitives `packInt8X64#`, `packWord8X64#`, `unpackInt8X64#`, `unpackWord8X64#` cannot be used because their types include unboxed 64-tuples, but any haskell code using them does not compile due to the 62-tuple limitation.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Some 64-vector SIMD primitives are absolutely useless","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"The primitives `packInt8X64#`, `packWord8X64#`, `unpackInt8X64#`, `unpackWord8X64#` cannot be used because their types include unboxed 64-tuples, but any haskell code using them does not compile due to the 62-tuple limitation.","type_of_failure":"OtherFailure","blocking":[]} -->https://gitlab.haskell.org/ghc/ghc/-/issues/7741Add SIMD support to x86/x86_64 NCG2023-05-03T13:47:19Zshelarcy@capella.freemail.ne.jpAdd SIMD support to x86/x86_64 NCGghc-7.7.20130301 has SIMD support. But only LLVM backend supports SIMD currently. If we want to use SIMD, we should use LLVM backend. I request to add SIMD support to x86/x86_64 NCG.
<details><summary>Trac metadata</summary>
| Trac fie...ghc-7.7.20130301 has SIMD support. But only LLVM backend supports SIMD currently. If we want to use SIMD, we should use LLVM backend. I request to add SIMD support to x86/x86_64 NCG.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | -------------- |
| Version | 7.7 |
| Type | FeatureRequest |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Add SIMD support to x86/x86_64 NCG","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"7.7","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"FeatureRequest","description":"ghc-7.7.20130301 has SIMD support. But only LLVM backend supports SIMD currently. If we want to use SIMD, we should use LLVM backend. I request to add SIMD support to x86/x86_64 NCG.","type_of_failure":"OtherFailure","blocking":[]} -->Andreas KlebingerAbhiroop SarkarAndreas Klebingerhttps://gitlab.haskell.org/ghc/ghc/-/issues/3557CPU Vector instructions in GHC.Prim2019-07-07T19:03:22ZguestCPU Vector instructions in GHC.PrimIt would be nice to have support for vector unit (MMX, SSE, AltiVec, and so on) operations in GHC. Currently Data Parallel Haskell cannot utilize vector units due to GHC's lack of support.
Those vector operations could be nicely used to ...It would be nice to have support for vector unit (MMX, SSE, AltiVec, and so on) operations in GHC. Currently Data Parallel Haskell cannot utilize vector units due to GHC's lack of support.
Those vector operations could be nicely used to get e.g. stereo signal processing for the price of mono signal processing.
Maybe those operations could be added to GHC.Prim, or because there are so many, to a new module, GHC.Prim.Vector.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------------------- |
| Version | 6.11 |
| Type | FeatureRequest |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler (NCG) |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | ghc@henning-thielemann.de |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"SIMD operations in GHC.Prim","status":"New","operating_system":"","component":"Compiler (NCG)","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"6.11","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":["ghc@henning-thielemann.de"],"type":"FeatureRequest","description":"It would be nice to have support for vector unit (MMX, SSE, AltiVec, and so on) operations in GHC. Currently Data Parallel Haskell cannot utilize vector units due to GHC's lack of support.\r\nThose vector operations could be nicely used to get e.g. stereo signal processing for the price of mono signal processing.\r\nMaybe those operations could be added to GHC.Prim, or because there are so many, to a new module, GHC.Prim.Vector.\r\n","type_of_failure":"OtherFailure","blocking":[]} -->