

# Implementing primitive Bool\#



# Implementing new primitive comparisons to allow branchless algorithms









This page gathers the notes about implementing new primitive logical operations and thus resolving ticket [\#6135](https://gitlab.haskell.org//ghc/ghc/issues/6135).

...  ...  @@ 89,9 +89,16 @@ There are five possible branches to take, although four of them have the same re 


## Solution









The idea behind the solution is to modify comparison primops to return unboxed unlifted `Int#` instead of `Bool` (which is lifted and thus is returned as a thunk that needs to be evaluated). This will be implemented in the following way:



This problem was solved by modifying comparison primops to return unboxed unlifted `Int#` instead of `Bool` (which is lifted and thus is returned as a thunk that needs to be evaluated). Having `Int#` returned as a result of logical comparison will allow to use branchless bitwise logical operators instead of branching logical operators defined by Haskell.






 existing comparison primops will have their return type changed to `Int#`. Also, their names will be changed. Operators will have `$` added before `#`, others will have `I` added before the `#` (this is a mnemonic denoting that this primop returns and `Int#`). Examples:



## Implementation details









Below is a summary of implementation details and decisions:






 the new comparison primops return a value of type `Int#`: `1#` represents `True` and `0#` represents `False`. The `Int#` type was chosen because on Haskell it is more common to use signed Int type insetad of unsigned Word. By using `Int#` the users can easily convert unboxed result into a boxed value, without need to use `word2Int#` and `int2word#` primops.



 as a small sidetask, four new logical bitwise primops have been implemented: `andI#`, `orI#`, `xorI#` and `negI#` ([\#7689](https://gitlab.haskell.org//ghc/ghc/issues/7689)). These operate on values of type `Int#`. Earlier we had only bitwise logical primops operating on values of type `Word#`.



 names of the existing comparison primops were changed. Operators will have `$` added before `#`, others will have `I` added before the `#` (this is a mnemonic denoting that this primop returns and `Int#`). Examples:






```wiki



>=$# :: Int# > Int# > Int#

...  ...  @@ 102,7 +109,7 @@ ltFloatI# :: Float# > Float# > Int# 


leAddrI# :: Addr# > Addr# > Int#



```






 a new module `GHC.PrimWrappers` will be added to ghcprim library. This module will contain wrappers for comparison primops. These wrappers will have names identical to removed primops and will return a `Bool`. Examples:



 a new module `GHC.PrimWrappers` was added to ghcprim library. This module contains wrappers for comparison primops. These wrappers have names identical to removed primops and return a `Bool`. Examples:






```wiki



gtChar# :: Char# > Char# > Bool

...  ...  @@ 125,26 +132,28 @@ leAddr# a b = tagToEnum# (a `leAddrI#` b) 


```









Thanks to these wrappers the change will be almost backwards compatible. The only thing primop users will need to change in their existing code to make it work again is adding import of !GHC.PrimWrappers module.



Thanks to these wrappers the change is almost backwards compatible. The only thing primop users need to change in their existing code to make it work again is adding import of !GHC.PrimWrappers module.






 The following boot libraries require modification in order to work with the new primops: base, ghcprim and integergmp. The only required modifications are imports of the !GHC.PrimWrappers module in modules that use the primops.



 functions for comparing `Integer` type, implemented in integergmp and integersimple libraries, received a similar treatment. Technically they are not primops, because they are implemented in Haskell (in case of integergmp also with FFI), but they pretend to be ones. There are six primops for comparing `Integer` values:






## Proof of concept



```wiki



eqInteger# :: Integer > Integer > Int#



neqInteger# :: Integer > Integer > Int#



leInteger# :: Integer > Integer > Int#



ltInteger# :: Integer > Integer > Int#



gtInteger# :: Integer > Integer > Int#



geInteger# :: Integer > Integer > Int#



```









The prototype patch posted on the trac on 13th of March implemented six new prototype comparison primops:



Each of these functions has a wrapper that calls `tagToEnum#` and returns a `Bool`. These wrappers are: `eqInteger`, `neqInteger`, `leInteger`, `ltInteger`, `gtInteger` and `geInteger`.






```wiki



.># :: Int# > Int# > Int#



.<# :: Int# > Int# > Int#



.>=# :: Int# > Int# > Int#



.<=# :: Int# > Int# > Int#



.==# :: Int# > Int# > Int#



./=# :: Int# > Int# > Int#



```



 Other libraries that were modified to work with the new primops are: base, ghcprim and primitive. The only required modifications were imports of the !GHC.PrimWrappers module in modules that use the primops.






## Eliminating branches using new primops






Each of these new primops takes two `Int#`s that are to be compared. The result is also an `Int#`: `0#` if the relation between the operands does not hold and `1#` when it does hold. For example `5# .># 3#` returns `1#` and `3# .># 3#` returns `0#`. With the new primops we can rewrite the original expression that motivated the problem:






With the new primops we can rewrite the original expression that motivated the problem:






```wiki



case (x <# 0#)  (x >=# width)  (y <# 0#)  (y >=# height) of

...  ...  @@ 156,52 +165,53 @@ case (x <# 0#)  (x >=# width)  (y <# 0#)  (y >=# height) of 


as






```wiki



case (x .<# 0#) `orI#` (x .>=# width) `orI#` (y .<# 0#) `orI#` (y .>=# height) of



case (x <$# 0#) `orI#` (x >=$# width) `orI#` (y <$# 0#) `orI#` (y >=$# height) of



True > E1



False > E2



```









(Note: `orI#` is a bitwise OR operation on operands of type `Int#`. It was introduced together with `andI#`, `notI#` and `xor#` in [\#7689](https://gitlab.haskell.org//ghc/ghc/issues/7689)). Using the LLVM backend this compiles to:



Using the LLVM backend this compiles to:






```wiki



# BB#0: # %c1nK



movq %rsi, %rax



orq %r14, %rax



shrq $63, %rax



cmpq %rdi, %r14



setge %cl



movzbl %cl, %ecx



orq %rax, %rcx



cmpq %r8, %rsi



setge %al



movzbl %al, %eax



orq %rcx, %rax



jne .LBB2_1



# BB#3: # %c1ol



movq (%rbp), %rax



movl $r1m6_closure+1, %ebx



jmpq *%rax # TAILCALL



.LBB2_1: # %c1nK



cmpq $1, %rax



jne .LBB2_2



# BB#4: # %c1ov



movq (%rbp), %rax



movl $r1m7_closure+1, %ebx



jmpq *%rax # TAILCALL



.LBB2_2: # %c1ob



movq r1m5_closure(%rip), %rax



movl $r1m5_closure, %ebx



jmpq *%rax # TAILCALL



# BB#0: # %c1oe



movq %rsi, %rax



orq %r14, %rax



shrq $63, %rax



cmpq %rdi, %r14



setge %cl



movzbl %cl, %ecx



orq %rax, %rcx



cmpq %r8, %rsi



setge %al



movzbl %al, %eax



orq %rcx, %rax



jne .LBB2_1



# BB#3: # %c1oP



movq (%rbp), %rax



movl $r1mu_closure+1, %ebx



jmpq *%rax # TAILCALL



.LBB2_1: # %c1oe



cmpq $1, %rax



jne .LBB2_2



# BB#4: # %c1oZ



movq (%rbp), %rax



movl $r1mv_closure+1, %ebx



jmpq *%rax # TAILCALL



.LBB2_2: # %c1oF



movq r1mt_closure(%rip), %rax



movl $r1mt_closure, %ebx



jmpq *%rax # TAILCALL






```









The assembly does not contain comparisons and jumps in the scrutinee of the case expression, but still does jumps for selecting an appropriate branch of the case expression.



The assembly does not contain comparisons and branches in the scrutinee of the case expression, but still uses jumps to select an appropriate branch of the case expression.






### Benchmarks for the proposed patch



### Benchmarks









Below is a benchmark for the proofofconcept filter function that demonstrates performance gains possible with the new primops:



Below is a benchmark for the proofofconcept branchless filter function that demonstrates performance gains possible with the new primops:






```wiki



{# LANGUAGE BangPatterns, MagicHash #}

...  ...  @@ 216,7 +226,7 @@ import Criterion.Main 


import Data.Vector.Unboxed.Mutable (unsafeNew, unsafeSlice, unsafeWrite)



import Data.Vector.Unboxed as U (Vector, filter, foldM',



fromList, length, unsafeFreeze)



import GHC.Exts (Int (I#), (.>=#))



import GHC.Exts (Int (I#), (>=$#))



import System.Random (RandomGen, mkStdGen, randoms)



import Prelude hiding (filter, length)




...  ...  @@ 227,7 +237,7 @@ filterN vec = runST $ do 


fVec < unsafeNew size



let put i x = do



let !(I# v) = x



inc = I# (v .>=# 0#)



inc = I# (v >=$# 0#)



unsafeWrite fVec i x



return $ i + inc



fSize < foldM' put 0 vec

...  ...  @@ 258,7 +268,6 @@ benchConfig :: Config 


benchConfig = defaultConfig {



cfgPerformGC = ljust True



}






```







...  ...  @@ 270,4 +279,6 @@ ghc O2 fllvm optloO3 Main.hs 


```









Benchmarking shows that `filterN` function is 60% faster than the `filter` function based on stream fusion (tested for unboxed vectors containing 10 thousand and 10 million elements). 


Benchmarking shows that `filterN` function is about 5565% faster than the `filter` function based on stream fusion (tested for unboxed vectors containing 10 thousand and 10 million elements). Below is an example benchmarking report from criterion:






[](http://ics.p.lodz.pl/~stolarek/ghc/primboolcriterion.png) 


\ No newline at end of file 