Skip to content
Snippets Groups Projects
Forked from Glasgow Haskell Compiler / GHC
Loading
  • Andreas Klebinger's avatar
    348cc8eb
    Add two CmmSwitch optimizations. · 348cc8eb
    Andreas Klebinger authored and Marge Bot's avatar Marge Bot committed
    Move switch expressions into a local variable when generating switches.
    This avoids duplicating the expression if we translate the switch
    to a tree search. This fixes #16933.
    
    Further we now check if all branches of a switch have the same
    destination, replacing the switch with a direct branch if that
    is the case.
    
    Both of these patterns appear in the ENTER macro used by the RTS
    but are unlikely to occur in intermediate Cmm generated by GHC.
    
    Nofib result summary:
    
    --------------------------------------------------------------------------------
            Program           Size    Allocs   Runtime   Elapsed  TotalMem
    --------------------------------------------------------------------------------
                Min          -0.0%     -0.0%    -15.7%    -15.6%      0.0%
                Max          -0.0%      0.0%     +5.4%     +5.5%      0.0%
     Geometric Mean          -0.0%     -0.0%     -1.0%     -1.0%     -0.0%
    
    Compiler allocations go up slightly: +0.2%
    
    Example output before and after the change taken from RTS code below.
    
    All but one of the memory loads `I32[_c3::I64 - 8]` are eliminated.
    Instead the data is loaded once from memory in block c6.
    
    Also the switch in block `ud` in the original code has been
    eliminated completely.
    
    Cmm without this commit:
    
    ```
    stg_ap_0_fast() { //  [R1]
            { []
            }
        {offset
          ca: _c1::P64 = R1;   // CmmAssign
              goto c2;   // CmmBranch
          c2: if (_c1::P64 & 7 != 0) goto c4; else goto c6;
          c6: _c3::I64 = I64[_c1::P64];
              if (I32[_c3::I64 - 8] < 26 :: W32) goto ub; else goto ug;
          ub: if (I32[_c3::I64 - 8] < 15 :: W32) goto uc; else goto ue;
          uc: if (I32[_c3::I64 - 8] < 8 :: W32) goto c7; else goto ud;
          ud: switch [8 .. 14] (%MO_SS_Conv_W32_W64(I32[_c3::I64 - 8])) {
                  case 8, 9, 10, 11, 12, 13, 14 : goto c4;
              }
          ue: if (I32[_c3::I64 - 8] >= 25 :: W32) goto c4; else goto uf;
          uf: if (%MO_SS_Conv_W32_W64(I32[_c3::I64 - 8]) != 23) goto c7; else goto c4;
          c4: R1 = _c1::P64;
              call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
          ug: if (I32[_c3::I64 - 8] < 28 :: W32) goto uh; else goto ui;
          uh: if (I32[_c3::I64 - 8] < 27 :: W32) goto c7; else goto c8;
          ui: if (I32[_c3::I64 - 8] < 29 :: W32) goto c8; else goto c7;
          c8: _c1::P64 = P64[_c1::P64 + 8];
              goto c2;
          c7: R1 = _c1::P64;
              call (_c3::I64)(R1) args: 8, res: 0, upd: 8;
        }
    }
    ```
    
    Cmm with this commit:
    
    ```
    stg_ap_0_fast() { //  [R1]
            { []
            }
        {offset
          ca: _c1::P64 = R1;
              goto c2;
          c2: if (_c1::P64 & 7 != 0) goto c4; else goto c6;
          c6: _c3::I64 = I64[_c1::P64];
              _ub::I64 = %MO_SS_Conv_W32_W64(I32[_c3::I64 - 8]);
              if (_ub::I64 < 26) goto uc; else goto uh;
          uc: if (_ub::I64 < 15) goto ud; else goto uf;
          ud: if (_ub::I64 < 8) goto c7; else goto c4;
          uf: if (_ub::I64 >= 25) goto c4; else goto ug;
          ug: if (_ub::I64 != 23) goto c7; else goto c4;
          c4: R1 = _c1::P64;
              call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
          uh: if (_ub::I64 < 28) goto ui; else goto uj;
          ui: if (_ub::I64 < 27) goto c7; else goto c8;
          uj: if (_ub::I64 < 29) goto c8; else goto c7;
          c8: _c1::P64 = P64[_c1::P64 + 8];
              goto c2;
          c7: R1 = _c1::P64;
              call (_c3::I64)(R1) args: 8, res: 0, upd: 8;
        }
    }
    ```
    348cc8eb
    History
    Add two CmmSwitch optimizations.
    Andreas Klebinger authored and Marge Bot's avatar Marge Bot committed
    Move switch expressions into a local variable when generating switches.
    This avoids duplicating the expression if we translate the switch
    to a tree search. This fixes #16933.
    
    Further we now check if all branches of a switch have the same
    destination, replacing the switch with a direct branch if that
    is the case.
    
    Both of these patterns appear in the ENTER macro used by the RTS
    but are unlikely to occur in intermediate Cmm generated by GHC.
    
    Nofib result summary:
    
    --------------------------------------------------------------------------------
            Program           Size    Allocs   Runtime   Elapsed  TotalMem
    --------------------------------------------------------------------------------
                Min          -0.0%     -0.0%    -15.7%    -15.6%      0.0%
                Max          -0.0%      0.0%     +5.4%     +5.5%      0.0%
     Geometric Mean          -0.0%     -0.0%     -1.0%     -1.0%     -0.0%
    
    Compiler allocations go up slightly: +0.2%
    
    Example output before and after the change taken from RTS code below.
    
    All but one of the memory loads `I32[_c3::I64 - 8]` are eliminated.
    Instead the data is loaded once from memory in block c6.
    
    Also the switch in block `ud` in the original code has been
    eliminated completely.
    
    Cmm without this commit:
    
    ```
    stg_ap_0_fast() { //  [R1]
            { []
            }
        {offset
          ca: _c1::P64 = R1;   // CmmAssign
              goto c2;   // CmmBranch
          c2: if (_c1::P64 & 7 != 0) goto c4; else goto c6;
          c6: _c3::I64 = I64[_c1::P64];
              if (I32[_c3::I64 - 8] < 26 :: W32) goto ub; else goto ug;
          ub: if (I32[_c3::I64 - 8] < 15 :: W32) goto uc; else goto ue;
          uc: if (I32[_c3::I64 - 8] < 8 :: W32) goto c7; else goto ud;
          ud: switch [8 .. 14] (%MO_SS_Conv_W32_W64(I32[_c3::I64 - 8])) {
                  case 8, 9, 10, 11, 12, 13, 14 : goto c4;
              }
          ue: if (I32[_c3::I64 - 8] >= 25 :: W32) goto c4; else goto uf;
          uf: if (%MO_SS_Conv_W32_W64(I32[_c3::I64 - 8]) != 23) goto c7; else goto c4;
          c4: R1 = _c1::P64;
              call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
          ug: if (I32[_c3::I64 - 8] < 28 :: W32) goto uh; else goto ui;
          uh: if (I32[_c3::I64 - 8] < 27 :: W32) goto c7; else goto c8;
          ui: if (I32[_c3::I64 - 8] < 29 :: W32) goto c8; else goto c7;
          c8: _c1::P64 = P64[_c1::P64 + 8];
              goto c2;
          c7: R1 = _c1::P64;
              call (_c3::I64)(R1) args: 8, res: 0, upd: 8;
        }
    }
    ```
    
    Cmm with this commit:
    
    ```
    stg_ap_0_fast() { //  [R1]
            { []
            }
        {offset
          ca: _c1::P64 = R1;
              goto c2;
          c2: if (_c1::P64 & 7 != 0) goto c4; else goto c6;
          c6: _c3::I64 = I64[_c1::P64];
              _ub::I64 = %MO_SS_Conv_W32_W64(I32[_c3::I64 - 8]);
              if (_ub::I64 < 26) goto uc; else goto uh;
          uc: if (_ub::I64 < 15) goto ud; else goto uf;
          ud: if (_ub::I64 < 8) goto c7; else goto c4;
          uf: if (_ub::I64 >= 25) goto c4; else goto ug;
          ug: if (_ub::I64 != 23) goto c7; else goto c4;
          c4: R1 = _c1::P64;
              call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
          uh: if (_ub::I64 < 28) goto ui; else goto uj;
          ui: if (_ub::I64 < 27) goto c7; else goto c8;
          uj: if (_ub::I64 < 29) goto c8; else goto c7;
          c8: _c1::P64 = P64[_c1::P64 + 8];
              goto c2;
          c7: R1 = _c1::P64;
              call (_c3::I64)(R1) args: 8, res: 0, upd: 8;
        }
    }
    ```
Code owners