Skip to content
Snippets Groups Projects

Add Aarch64 clz, ctz and brev primops

Closed Alex Mason requested to merge Axman6/ghc:wip/aarch64-clz-ctz into master

Adds assembly implementations of the count leading zeros, count trailing zeros and bit reverse primops for Aarch64 for W8-W64 sizes. The code produced here appears to be better than what the C compiler produces for hs_clz8 and hs_clz16, which uses conditional moves. I'll make a PR changing the implementations in clz.c etc. so they can be a little faster on all platforms that use them too.

I also made some of the case statements in compiler/GHC/CmmToAsm/AArch64/Instr.hs explicitly match all constructors so editor tools can let you know when an instruction is added but the necessary cases aren't handled - it's bitten me a few times already.

Please take a few moments to address the following points:

  • if your MR may break existing programs (e.g. touches base or causes the compiler to reject programs), please describe the expected breakage and add the user-facing label. This will run ghc/head.hackage> to characterise the effect of your change on Hackage.
  • ensure that your commits are either individually buildable or squashed
  • ensure that your commit messages describe what they do (referring to tickets using #NNNN syntax when appropriate)
  • have added source comments describing your change. For larger changes you likely should add a [Note][notes] and cross-reference it from the relevant places.
  • add a [testcase to the testsuite][adding test].
  • updates the users guide if applicable
  • mentions new features in the release notes for the next release
Edited by Alex Mason

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Alex Mason marked the checklist item if your MR may break existing programs (e.g. touches base or causes the as completed

    marked the checklist item if your MR may break existing programs (e.g. touches base or causes the as completed

  • Alex Mason marked the checklist item ensure that your commits are either individually buildable or squashed as completed

    marked the checklist item ensure that your commits are either individually buildable or squashed as completed

  • Alex Mason marked the checklist item ensure that your commit messages describe what they do as completed

    marked the checklist item ensure that your commit messages describe what they do as completed

  • Alex Mason marked the checklist item have added source comments describing your change. For larger changes you as completed

    marked the checklist item have added source comments describing your change. For larger changes you as completed

  • Alex Mason changed title from WIP: Add Aarch64 clz, ctz and brev primops to Add Aarch64 clz, ctz and brev primops

    changed title from WIP: Add Aarch64 clz, ctz and brev primops to Add Aarch64 clz, ctz and brev primops

    • We have to be careful with ctz/clz and 0: we return the word size, not 0.

      Could you add the following test in testsuite/tests/codeGen/should_run/:

      all.T:

      ...
      test('CtzClz0', normal, compile_and_run, [''])

      CtzClz0.hs:

      {-# LANGUAGE CPP #-}
      {-# LANGUAGE MagicHash #-}
      
      module Main where
      
      import GHC.Exts
      import Control.Monad
      
      #include <MachDeps.h>
      
      {-# OPAQUE x #-} -- needed to avoid triggering constant folding
      x :: Word
      x = 0
      
      main :: IO ()
      main = do
        let !(W# w) = x
      
        guard (W# (ctz# w) == WORD_SIZE_IN_BITS)
        guard (W# (ctz8# w) == 8)
        guard (W# (ctz16# w) == 16)
        guard (W# (ctz32# w) == 32)
      
        guard (W# (clz# w) == WORD_SIZE_IN_BITS)
        guard (W# (clz8# w) == 8)
        guard (W# (clz16# w) == 16)
        guard (W# (clz32# w) == 32)  
      Edited by Sylvain Henry
    • I couldn't find what AArch64's CLZ/CTZ instructions do in this case. So we'd better add a test.

    • Author Developer

      Thanks @hsyl20, I'll try and find some time this week to add that.

      I have confirmed that the clz instruction does return 32/64 for 32/64 bit zero's, according to https://www.scs.stanford.edu/~zyedidia/arm64/clz_int.html (then here and then here - the pseudo code for HighestSetBit returns -1 when passed zero, so you end up with N (-1 + 1) = N). I've also tested in C just to sanity check.

      Thanks for taking a look, and prompting me to question myself!

    • Author Developer

      These tests have been added, and pass on my machine.

      I also wanted to propose a change to the C implementations of clz/ctz based off this MR, which would look like the following:

      diff --git a/libraries/ghc-prim/cbits/clz.c b/libraries/ghc-prim/cbits/clz.c
      index b0637b5dfe0..9b20307ba03 100644
      --- a/libraries/ghc-prim/cbits/clz.c
      +++ b/libraries/ghc-prim/cbits/clz.c
      @@ -10,13 +10,15 @@
       StgWord
       hs_clz8(StgWord x)
       {
      -  return (uint8_t)x ? __builtin_clz((uint8_t)x)-24 : 8;
      +  return __builtin_clz((x << 24) | (1 << 23));
       }
       
       StgWord
       hs_clz16(StgWord x)
       {
      -  return (uint16_t)x ? __builtin_clz((uint16_t)x)-16 : 16;
      +  return __builtin_clz((x << 16) | (1 << 15));
       }
       
       StgWord
      diff --git a/libraries/ghc-prim/cbits/ctz.c b/libraries/ghc-prim/cbits/ctz.c
      index 755ad6e0b35..2c0fc2acdfc 100644
      --- a/libraries/ghc-prim/cbits/ctz.c
      +++ b/libraries/ghc-prim/cbits/ctz.c
      @@ -10,19 +10,22 @@
       StgWord
       hs_ctz8(StgWord x)
       {
      -  return (uint8_t)x ? __builtin_ctz(x) : 8;
      +  return __builtin_ctz(x | 0x100);
       }
       
       StgWord
       hs_ctz16(StgWord x)
       {
      -  return (uint16_t)x ? __builtin_ctz(x) : 16;
      +  return __builtin_ctz(x | 0x10000);
       }
       
       StgWord
       hs_ctz32(StgWord x)
       {
      -  return (uint32_t)x ? __builtin_ctz(x) : 32;
      +  return hs_ctz64((StgWord64)x | 0x100000000ULL);
       }
       #else
       # error no suitable __builtin_ctz() found

      It looks like GCC and Clang aren't smart enough to make this non-branching, at least on aarch64. I'm wondering if this is sufficiently related to the other changes in the MR to include as well.

      Pinging @hvr as the author of clz.c & ctz.c.

    • I would suggest creating another ticket/MR for these changes. I'm not sure about the StgWord64 cast, maybe we could use __builtin_ffs instead which is defined for a 0 argument.

      And thanks for adding the test!

      Edited by Sylvain Henry
    • Please register or sign in to reply
  • Alex Mason added 1 commit

    added 1 commit

    Compare with previous version

  • Alex Mason added 4084 commits

    added 4084 commits

    Compare with previous version

  • Alex Mason marked the checklist item add a [testcase to the testsuite][adding test]. as completed

    marked the checklist item add a [testcase to the testsuite][adding test]. as completed

  • LGTM! Please squash the commits (I believe they can all be merged into one).

    • Please write a commit message before merging, I applied the test-primops label so the test-primops job will run on this branch.

      Edited by Matthew Pickering
    • Author Developer

      Thanks Matthew, I have squashed the commits and written a more explanatory commit message - let me know if it's missing anything. I wasn't aware of that tag, I feel like it might have been useful on previous MRs of mine - good to know!

      I noticed that the pipeline failed on aarch64-darwin, but it doesn't fail on my Mac, and the tests look unrelated to my changes; hoping this is just an issue with the macOS running being temperamental. Hopefully this pipeline runs fine.

    • Please register or sign in to reply
  • Alex Mason added 1 commit

    added 1 commit

    • cac021fe - Add AArch64 CLZ, CTZ, RBIT instructions.

    Compare with previous version

  • Alex Mason added 1 commit

    added 1 commit

    • 60c9598a - Add AArch64 CLZ, CTZ, RBIT primop implementations.

    Compare with previous version

  • Sylvain Henry approved this merge request

    approved this merge request

  • Alex Mason added 43 commits

    added 43 commits

    Compare with previous version

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading