
# Implementing primitive Bool\#


# Implementing new primitive comparisons to allow branchless algorithms










This page gathers the notes about implementing new primitive logical operations and thus resolving ticket [\#6135](https://gitlab.haskell.org//ghc/ghc/issues/6135).


This page gathers the notes about implementing new primitive logical operations and thus resolving ticket [\#6135](https://gitlab.haskell.org//ghc/ghc/issues/6135).

...  @@ 89,9 +89,16 @@ There are five possible branches to take, although four of them have the same re 
...  @@ 89,9 +89,16 @@ There are five possible branches to take, although four of them have the same re 

## Solution


## Solution










The idea behind the solution is to modify comparison primops to return unboxed unlifted `Int#` instead of `Bool` (which is lifted and thus is returned as a thunk that needs to be evaluated). This will be implemented in the following way:


This problem was solved by modifying comparison primops to return unboxed unlifted `Int#` instead of `Bool` (which is lifted and thus is returned as a thunk that needs to be evaluated). Having `Int#` returned as a result of logical comparison will allow to use branchless bitwise logical operators instead of branching logical operators defined by Haskell.






 existing comparison primops will have their return type changed to `Int#`. Also, their names will be changed. Operators will have `$` added before `#`, others will have `I` added before the `#` (this is a mnemonic denoting that this primop returns and `Int#`). Examples:


## Implementation details












Below is a summary of implementation details and decisions:








 the new comparison primops return a value of type `Int#`: `1#` represents `True` and `0#` represents `False`. The `Int#` type was chosen because on Haskell it is more common to use signed Int type insetad of unsigned Word. By using `Int#` the users can easily convert unboxed result into a boxed value, without need to use `word2Int#` and `int2word#` primops.




 as a small sidetask, four new logical bitwise primops have been implemented: `andI#`, `orI#`, `xorI#` and `negI#` ([\#7689](https://gitlab.haskell.org//ghc/ghc/issues/7689)). These operate on values of type `Int#`. Earlier we had only bitwise logical primops operating on values of type `Word#`.




 names of the existing comparison primops were changed. Operators will have `$` added before `#`, others will have `I` added before the `#` (this is a mnemonic denoting that this primop returns and `Int#`). Examples:






```wiki


```wiki


>=$# :: Int# > Int# > Int#


>=$# :: Int# > Int# > Int#

...  @@ 102,7 +109,7 @@ ltFloatI# :: Float# > Float# > Int# 
...  @@ 102,7 +109,7 @@ ltFloatI# :: Float# > Float# > Int# 

leAddrI# :: Addr# > Addr# > Int#


leAddrI# :: Addr# > Addr# > Int#


```


```






 a new module `GHC.PrimWrappers` will be added to ghcprim library. This module will contain wrappers for comparison primops. These wrappers will have names identical to removed primops and will return a `Bool`. Examples:


 a new module `GHC.PrimWrappers` was added to ghcprim library. This module contains wrappers for comparison primops. These wrappers have names identical to removed primops and return a `Bool`. Examples:






```wiki


```wiki


gtChar# :: Char# > Char# > Bool


gtChar# :: Char# > Char# > Bool

...  @@ 125,26 +132,28 @@ leAddr# a b = tagToEnum# (a `leAddrI#` b) 
...  @@ 125,26 +132,28 @@ leAddr# a b = tagToEnum# (a `leAddrI#` b) 

```


```










Thanks to these wrappers the change will be almost backwards compatible. The only thing primop users will need to change in their existing code to make it work again is adding import of !GHC.PrimWrappers module.


Thanks to these wrappers the change is almost backwards compatible. The only thing primop users need to change in their existing code to make it work again is adding import of !GHC.PrimWrappers module.






 The following boot libraries require modification in order to work with the new primops: base, ghcprim and integergmp. The only required modifications are imports of the !GHC.PrimWrappers module in modules that use the primops.


 functions for comparing `Integer` type, implemented in integergmp and integersimple libraries, received a similar treatment. Technically they are not primops, because they are implemented in Haskell (in case of integergmp also with FFI), but they pretend to be ones. There are six primops for comparing `Integer` values:






## Proof of concept


```wiki




eqInteger# :: Integer > Integer > Int#




neqInteger# :: Integer > Integer > Int#




leInteger# :: Integer > Integer > Int#




ltInteger# :: Integer > Integer > Int#




gtInteger# :: Integer > Integer > Int#




geInteger# :: Integer > Integer > Int#




```










The prototype patch posted on the trac on 13th of March implemented six new prototype comparison primops:


Each of these functions has a wrapper that calls `tagToEnum#` and returns a `Bool`. These wrappers are: `eqInteger`, `neqInteger`, `leInteger`, `ltInteger`, `gtInteger` and `geInteger`.






```wiki


 Other libraries that were modified to work with the new primops are: base, ghcprim and primitive. The only required modifications were imports of the !GHC.PrimWrappers module in modules that use the primops.


.># :: Int# > Int# > Int#




.<# :: Int# > Int# > Int#




.>=# :: Int# > Int# > Int#




.<=# :: Int# > Int# > Int#




.==# :: Int# > Int# > Int#




./=# :: Int# > Int# > Int#




```










## Eliminating branches using new primops






Each of these new primops takes two `Int#`s that are to be compared. The result is also an `Int#`: `0#` if the relation between the operands does not hold and `1#` when it does hold. For example `5# .># 3#` returns `1#` and `3# .># 3#` returns `0#`. With the new primops we can rewrite the original expression that motivated the problem:






With the new primops we can rewrite the original expression that motivated the problem:






```wiki


```wiki


case (x <# 0#)  (x >=# width)  (y <# 0#)  (y >=# height) of


case (x <# 0#)  (x >=# width)  (y <# 0#)  (y >=# height) of

...  @@ 156,52 +165,53 @@ case (x <# 0#)  (x >=# width)  (y <# 0#)  (y >=# height) of 
...  @@ 156,52 +165,53 @@ case (x <# 0#)  (x >=# width)  (y <# 0#)  (y >=# height) of 

as


as






```wiki


```wiki


case (x .<# 0#) `orI#` (x .>=# width) `orI#` (y .<# 0#) `orI#` (y .>=# height) of


case (x <$# 0#) `orI#` (x >=$# width) `orI#` (y <$# 0#) `orI#` (y >=$# height) of


True > E1


True > E1


False > E2


False > E2


```


```










(Note: `orI#` is a bitwise OR operation on operands of type `Int#`. It was introduced together with `andI#`, `notI#` and `xor#` in [\#7689](https://gitlab.haskell.org//ghc/ghc/issues/7689)). Using the LLVM backend this compiles to:


Using the LLVM backend this compiles to:






```wiki


```wiki


# BB#0: # %c1nK


# BB#0: # %c1oe


movq %rsi, %rax


movq %rsi, %rax


orq %r14, %rax


orq %r14, %rax


shrq $63, %rax


shrq $63, %rax


cmpq %rdi, %r14


cmpq %rdi, %r14


setge %cl


setge %cl


movzbl %cl, %ecx


movzbl %cl, %ecx


orq %rax, %rcx


orq %rax, %rcx


cmpq %r8, %rsi


cmpq %r8, %rsi


setge %al


setge %al


movzbl %al, %eax


movzbl %al, %eax


orq %rcx, %rax


orq %rcx, %rax


jne .LBB2_1


jne .LBB2_1


# BB#3: # %c1ol


# BB#3: # %c1oP


movq (%rbp), %rax


movq (%rbp), %rax


movl $r1m6_closure+1, %ebx


movl $r1mu_closure+1, %ebx


jmpq *%rax # TAILCALL


jmpq *%rax # TAILCALL


.LBB2_1: # %c1nK


.LBB2_1: # %c1oe


cmpq $1, %rax


cmpq $1, %rax


jne .LBB2_2


jne .LBB2_2


# BB#4: # %c1ov


# BB#4: # %c1oZ


movq (%rbp), %rax


movq (%rbp), %rax


movl $r1m7_closure+1, %ebx


movl $r1mv_closure+1, %ebx


jmpq *%rax # TAILCALL


jmpq *%rax # TAILCALL


.LBB2_2: # %c1ob


.LBB2_2: # %c1oF


movq r1m5_closure(%rip), %rax


movq r1mt_closure(%rip), %rax


movl $r1m5_closure, %ebx


movl $r1mt_closure, %ebx


jmpq *%rax # TAILCALL


jmpq *%rax # TAILCALL






```


```










The assembly does not contain comparisons and jumps in the scrutinee of the case expression, but still does jumps for selecting an appropriate branch of the case expression.


The assembly does not contain comparisons and branches in the scrutinee of the case expression, but still uses jumps to select an appropriate branch of the case expression.






### Benchmarks for the proposed patch


### Benchmarks










Below is a benchmark for the proofofconcept filter function that demonstrates performance gains possible with the new primops:


Below is a benchmark for the proofofconcept branchless filter function that demonstrates performance gains possible with the new primops:






```wiki


```wiki


{# LANGUAGE BangPatterns, MagicHash #}


{# LANGUAGE BangPatterns, MagicHash #}

...  @@ 216,7 +226,7 @@ import Criterion.Main 
...  @@ 216,7 +226,7 @@ import Criterion.Main 

import Data.Vector.Unboxed.Mutable (unsafeNew, unsafeSlice, unsafeWrite)


import Data.Vector.Unboxed.Mutable (unsafeNew, unsafeSlice, unsafeWrite)


import Data.Vector.Unboxed as U (Vector, filter, foldM',


import Data.Vector.Unboxed as U (Vector, filter, foldM',


fromList, length, unsafeFreeze)


fromList, length, unsafeFreeze)


import GHC.Exts (Int (I#), (.>=#))


import GHC.Exts (Int (I#), (>=$#))


import System.Random (RandomGen, mkStdGen, randoms)


import System.Random (RandomGen, mkStdGen, randoms)


import Prelude hiding (filter, length)


import Prelude hiding (filter, length)





...  @@ 227,7 +237,7 @@ filterN vec = runST $ do 
...  @@ 227,7 +237,7 @@ filterN vec = runST $ do 

fVec < unsafeNew size


fVec < unsafeNew size


let put i x = do


let put i x = do


let !(I# v) = x


let !(I# v) = x


inc = I# (v .>=# 0#)


inc = I# (v >=$# 0#)


unsafeWrite fVec i x


unsafeWrite fVec i x


return $ i + inc


return $ i + inc


fSize < foldM' put 0 vec


fSize < foldM' put 0 vec

...  @@ 258,7 +268,6 @@ benchConfig :: Config 
...  @@ 258,7 +268,6 @@ benchConfig :: Config 

benchConfig = defaultConfig {


benchConfig = defaultConfig {


cfgPerformGC = ljust True


cfgPerformGC = ljust True


}


}






```


```









...  @@ 270,4 +279,6 @@ ghc O2 fllvm optloO3 Main.hs 
...  @@ 270,4 +279,6 @@ ghc O2 fllvm optloO3 Main.hs 

```


```










Benchmarking shows that `filterN` function is 60% faster than the `filter` function based on stream fusion (tested for unboxed vectors containing 10 thousand and 10 million elements). 

Benchmarking shows that `filterN` function is about 5565% faster than the `filter` function based on stream fusion (tested for unboxed vectors containing 10 thousand and 10 million elements). Below is an example benchmarking report from criterion:








[](http://ics.p.lodz.pl/~stolarek/ghc/primboolcriterion.png) 



\ No newline at end of file 