Skip to content
Snippets Groups Projects
Commit b911c532 authored by Gabor Greif's avatar Gabor Greif :speech_balloon: Committed by Ben Gamari
Browse files

Implement pointer tagging for big families (#14373)

Formerly we punted on these and evaluated constructors always got a tag
of 1.

We now cascade switches because we have to check the tag first and when
it is MAX_PTR_TAG then get the precise tag from the info table and
switch on that. The only technically tricky part is that the default
case needs (logical) duplication. To do this we emit an extra label for
it and branch to that from the second switch. This avoids duplicated
codegen.

Here's a simple example of the new code gen:

    data D = D1 | D2 | D3 | D4 | D5 | D6 | D7 | D8

On a 64-bit system previously all constructors would be tagged 1. With
the new code gen D7 and D8 are tagged 7:

    [Lib.D7_con_entry() {
         ...
         {offset
           c1eu: // global
               R1 = R1 + 7;
               call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
         }
     }]

    [Lib.D8_con_entry() {
         ...
         {offset
           c1ez: // global
               R1 = R1 + 7;
               call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
         }
     }]

When switching we now look at the info table only when the tag is 7. For
example, if we derive Enum for the type above, the Cmm looks like this:

    c2Le:
        _s2Js::P64 = R1;
        _c2Lq::P64 = _s2Js::P64 & 7;
        switch [1 .. 7] _c2Lq::P64 {
            case 1 : goto c2Lk;
            case 2 : goto c2Ll;
            case 3 : goto c2Lm;
            case 4 : goto c2Ln;
            case 5 : goto c2Lo;
            case 6 : goto c2Lp;
            case 7 : goto c2Lj;
        }

    // Read info table for tag
    c2Lj:
        _c2Lv::I64 = %MO_UU_Conv_W32_W64(I32[I64[_s2Js::P64 & (-8)] - 4]);
        if (_c2Lv::I64 != 6) goto c2Lu; else goto c2Lt;

Generated Cmm sizes do not change too much, but binaries are very
slightly larger, due to the fact that the new instructions are longer in
encoded form. E.g. previously entry code for D8 above would be

    00000000000001c0 <Lib_D8_con_info>:
     1c0:	48 ff c3             	inc    %rbx
     1c3:	ff 65 00             	jmpq   *0x0(%rbp)

With this patch

    00000000000001d0 <Lib_D8_con_info>:
     1d0:	48 83 c3 07          	add    $0x7,%rbx
     1d4:	ff 65 00             	jmpq   *0x0(%rbp)

This is one byte longer.

Secondly, reading info table directly and then switching is shorter

    _c1co:
            movq -1(%rbx),%rax
            movl -4(%rax),%eax
            // Switch on info table tag
            jmp *_n1d5(,%rax,8)

than doing the same switch, and then for the tag 7 doing another switch:

    // When tag is 7
    _c1ct:
            andq $-8,%rbx
            movq (%rbx),%rax
            movl -4(%rax),%eax
            // Switch on info table tag
            ...

Some changes of binary sizes in actual programs:

- In NoFib the worst case is 0.1% increase in benchmark "parser" (see
  NoFib results below). All programs get slightly larger.

- Stage 2 compiler size does not change.

- In "containers" (the library) size of all object files increases
  0.0005%. Size of the test program "bitqueue-properties" increases
  0.03%.

nofib benchmarks kindly provided by Ömer (@osa1):

NoFib Results
=============

--------------------------------------------------------------------------------
        Program           Size    Allocs    Instrs     Reads    Writes
--------------------------------------------------------------------------------
             CS          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
            CSD          +0.0%      0.0%      0.0%     +0.0%     +0.0%
             FS          +0.0%      0.0%      0.0%     +0.0%      0.0%
              S          +0.0%      0.0%     -0.0%      0.0%      0.0%
             VS          +0.0%      0.0%     -0.0%     +0.0%     +0.0%
            VSD          +0.0%      0.0%     -0.0%     +0.0%     -0.0%
            VSM          +0.0%      0.0%      0.0%      0.0%      0.0%
           anna          +0.0%      0.0%     +0.1%     -0.9%     -0.0%
           ansi          +0.0%      0.0%     -0.0%     +0.0%     +0.0%
           atom          +0.0%      0.0%      0.0%      0.0%      0.0%
         awards          +0.0%      0.0%     -0.0%     +0.0%      0.0%
         banner          +0.0%      0.0%     -0.0%     +0.0%      0.0%
     bernouilli          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
   binary-trees          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
          boyer          +0.0%      0.0%     +0.0%      0.0%     -0.0%
         boyer2          +0.0%      0.0%     +0.0%      0.0%     -0.0%
           bspt          +0.0%      0.0%     +0.0%     +0.0%      0.0%
      cacheprof          +0.0%      0.0%     +0.1%     -0.8%      0.0%
       calendar          +0.0%      0.0%     -0.0%     +0.0%     -0.0%
       cichelli          +0.0%      0.0%     +0.0%      0.0%      0.0%
        circsim          +0.0%      0.0%     -0.0%     -0.1%     -0.0%
       clausify          +0.0%      0.0%     +0.0%     +0.0%      0.0%
  comp_lab_zift          +0.0%      0.0%     +0.0%      0.0%     -0.0%
       compress          +0.0%      0.0%     +0.0%     +0.0%      0.0%
      compress2          +0.0%      0.0%      0.0%      0.0%      0.0%
    constraints          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
   cryptarithm1          +0.0%      0.0%     +0.0%      0.0%      0.0%
   cryptarithm2          +0.0%      0.0%     +0.0%     -0.0%      0.0%
            cse          +0.0%      0.0%     +0.0%     +0.0%      0.0%
   digits-of-e1          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
   digits-of-e2          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
         dom-lt          +0.0%      0.0%     +0.0%     +0.0%      0.0%
          eliza          +0.0%      0.0%     -0.0%     +0.0%      0.0%
          event          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
    exact-reals          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
         exp3_8          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
         expert          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
 fannkuch-redux          +0.0%      0.0%     +0.0%      0.0%      0.0%
          fasta          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
            fem          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
            fft          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
           fft2          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
       fibheaps          +0.0%      0.0%     +0.0%     +0.0%      0.0%
           fish          +0.0%      0.0%     +0.0%     +0.0%      0.0%
          fluid          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
         fulsom          +0.0%      0.0%     +0.0%     -0.0%     +0.0%
         gamteb          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
            gcd          +0.0%      0.0%     +0.0%     +0.0%      0.0%
    gen_regexps          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
         genfft          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
             gg          +0.0%      0.0%      0.0%     -0.0%      0.0%
           grep          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
         hidden          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
            hpg          +0.0%      0.0%     +0.0%     -0.1%     -0.0%
            ida          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
          infer          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
        integer          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
      integrate          +0.0%      0.0%      0.0%     +0.0%      0.0%
   k-nucleotide          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
          kahan          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
        knights          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
         lambda          +0.0%      0.0%     +1.2%     -6.1%     -0.0%
     last-piece          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
           lcss          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
           life          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
           lift          +0.0%      0.0%     +0.0%     +0.0%      0.0%
         linear          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
      listcompr          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
       listcopy          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
       maillist          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
         mandel          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
        mandel2          +0.0%      0.0%     +0.0%     +0.0%     -0.0%
           mate          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
        minimax          +0.0%      0.0%     -0.0%     +0.0%     -0.0%
        mkhprog          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
     multiplier          +0.0%      0.0%      0.0%     +0.0%     -0.0%
         n-body          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
       nucleic2          +0.0%      0.0%     +0.0%     +0.0%     -0.0%
           para          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
      paraffins          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
         parser          +0.1%      0.0%     +0.4%     -1.7%     -0.0%
        parstof          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
            pic          +0.0%      0.0%     +0.0%      0.0%     -0.0%
       pidigits          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
          power          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
         pretty          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
         primes          +0.0%      0.0%     +0.0%      0.0%      0.0%
      primetest          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
         prolog          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
         puzzle          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
         queens          +0.0%      0.0%      0.0%     +0.0%     +0.0%
        reptile          +0.0%      0.0%     +0.0%     +0.0%      0.0%
reverse-complem          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
        rewrite          +0.0%      0.0%     +0.0%      0.0%     -0.0%
           rfib          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
            rsa          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
            scc          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
          sched          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
            scs          +0.0%      0.0%     +0.0%     +0.0%      0.0%
         simple          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
          solid          +0.0%      0.0%     +0.0%     +0.0%      0.0%
        sorting          +0.0%      0.0%     +0.0%     -0.0%      0.0%
  spectral-norm          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
         sphere          +0.0%      0.0%     +0.0%     -1.0%      0.0%
         symalg          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
            tak          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
      transform          +0.0%      0.0%     +0.4%     -1.3%     +0.0%
       treejoin          +0.0%      0.0%     +0.0%     -0.0%      0.0%
      typecheck          +0.0%      0.0%     -0.0%     +0.0%      0.0%
        veritas          +0.0%      0.0%     +0.0%     -0.1%     +0.0%
           wang          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
      wave4main          +0.0%      0.0%     +0.0%      0.0%     -0.0%
   wheel-sieve1          +0.0%      0.0%     +0.0%     +0.0%     +0.0%
   wheel-sieve2          +0.0%      0.0%     +0.0%     +0.0%      0.0%
           x2n1          +0.0%      0.0%     +0.0%     +0.0%      0.0%
--------------------------------------------------------------------------------
            Min          +0.0%      0.0%     -0.0%     -6.1%     -0.0%
            Max          +0.1%      0.0%     +1.2%     +0.0%     +0.0%
 Geometric Mean          +0.0%     -0.0%     +0.0%     -0.1%     -0.0%

NoFib GC Results
================

--------------------------------------------------------------------------------
        Program           Size    Allocs    Instrs     Reads    Writes
--------------------------------------------------------------------------------
        circsim          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
    constraints          +0.0%      0.0%     -0.0%      0.0%     -0.0%
       fibheaps          +0.0%      0.0%      0.0%     -0.0%     -0.0%
         fulsom          +0.0%      0.0%      0.0%     -0.6%     -0.0%
       gc_bench          +0.0%      0.0%      0.0%      0.0%     -0.0%
           hash          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
           lcss          +0.0%      0.0%      0.0%     -0.0%      0.0%
      mutstore1          +0.0%      0.0%      0.0%     -0.0%     -0.0%
      mutstore2          +0.0%      0.0%     +0.0%     -0.0%     -0.0%
          power          +0.0%      0.0%     -0.0%      0.0%     -0.0%
     spellcheck          +0.0%      0.0%     -0.0%     -0.0%     -0.0%
--------------------------------------------------------------------------------
            Min          +0.0%      0.0%     -0.0%     -0.6%     -0.0%
            Max          +0.0%      0.0%     +0.0%      0.0%      0.0%
 Geometric Mean          +0.0%     +0.0%     +0.0%     -0.1%     +0.0%

Fixes #14373

These performance regressions appear to be a fluke in CI. See the
discussion in !1742 for details.

Metric Increase:
    T6048
    T12234
    T12425
    Naperian
    T12150
    T5837
    T13035
parent 99e1816a
No related branches found
No related tags found
No related merge requests found
Showing
with 382 additions and 32 deletions
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment