1. 22 Feb, 2020 1 commit
  2. 12 Feb, 2020 1 commit
  3. 31 Jan, 2020 1 commit
    • Ömer Sinan Ağacan's avatar
      Do CafInfo/SRT analysis in Cmm · c846618a
      Ömer Sinan Ağacan authored
      This patch removes all CafInfo predictions and various hacks to preserve
      predicted CafInfos from the compiler and assigns final CafInfos to
      interface Ids after code generation. SRT analysis is extended to support
      static data, and Cmm generator is modified to allow generating
      static_link fields after SRT analysis.
      
      This also fixes `-fcatch-bottoms`, which introduces error calls in case
      expressions in CorePrep, which runs *after* CoreTidy (which is where we
      decide on CafInfos) and turns previously non-CAFFY things into CAFFY.
      
      Fixes #17648
      Fixes #9718
      
      Evaluation
      ==========
      
      NoFib
      -----
      
      Boot with: `make boot mode=fast`
      Run: `make mode=fast EXTRA_RUNTEST_OPTS="-cachegrind" NoFibRuns=1`
      
      --------------------------------------------------------------------------------
              Program           Size    Allocs    Instrs     Reads    Writes
      --------------------------------------------------------------------------------
                   CS          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  CSD          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                   FS          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                    S          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                   VS          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  VSD          -0.0%      0.0%     -0.0%     -0.0%     -0.5%
                  VSM          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 anna          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
                 ansi          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 atom          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               awards          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               banner          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           bernouilli          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         binary-trees          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                boyer          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               boyer2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 bspt          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            cacheprof          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             calendar          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             cichelli          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              circsim          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             clausify          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
        comp_lab_zift          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             compress          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            compress2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
          constraints          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         cryptarithm1          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         cryptarithm2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  cse          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         digits-of-e1          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         digits-of-e2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               dom-lt          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                eliza          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                event          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
          exact-reals          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               exp3_8          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               expert          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
       fannkuch-redux          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                fasta          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  fem          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  fft          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 fft2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             fibheaps          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 fish          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                fluid          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
               fulsom          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               gamteb          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  gcd          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
          gen_regexps          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               genfft          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                   gg          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 grep          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               hidden          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  hpg          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
                  ida          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                infer          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              integer          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            integrate          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         k-nucleotide          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                kahan          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              knights          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               lambda          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           last-piece          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 lcss          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 life          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 lift          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               linear          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
            listcompr          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             listcopy          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             maillist          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               mandel          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              mandel2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 mate          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              minimax          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              mkhprog          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           multiplier          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               n-body          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             nucleic2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 para          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            paraffins          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               parser          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
              parstof          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
                  pic          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             pidigits          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                power          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               pretty          -0.0%      0.0%     -0.3%     -0.4%     -0.4%
               primes          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            primetest          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               prolog          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               puzzle          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               queens          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              reptile          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
      reverse-complem          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              rewrite          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 rfib          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  rsa          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  scc          -0.0%      0.0%     -0.3%     -0.5%     -0.4%
                sched          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  scs          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               simple          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
                solid          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              sorting          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
        spectral-norm          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               sphere          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               symalg          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  tak          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            transform          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             treejoin          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            typecheck          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              veritas          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 wang          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            wave4main          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         wheel-sieve1          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         wheel-sieve2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 x2n1          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
      --------------------------------------------------------------------------------
                  Min          -0.1%      0.0%     -0.3%     -0.5%     -0.5%
                  Max          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
       Geometric Mean          -0.0%     -0.0%     -0.0%     -0.0%     -0.0%
      
      --------------------------------------------------------------------------------
              Program           Size    Allocs    Instrs     Reads    Writes
      --------------------------------------------------------------------------------
              circsim          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
          constraints          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             fibheaps          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             gc_bench          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 hash          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 lcss          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                power          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           spellcheck          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
      --------------------------------------------------------------------------------
                  Min          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
                  Max          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
       Geometric Mean          -0.0%     +0.0%     -0.0%     -0.0%     -0.0%
      
      Manual inspection of programs in testsuite/tests/programs
      ---------------------------------------------------------
      
      I built these programs with a bunch of dump flags and `-O` and compared
      STG, Cmm, and Asm dumps and file sizes.
      
      (Below the numbers in parenthesis show number of modules in the program)
      
      These programs have identical compiler (same .hi and .o sizes, STG, and
      Cmm and Asm dumps):
      
      - Queens (1), andre_monad (1), cholewo-eval (2), cvh_unboxing (3),
        andy_cherry (7), fun_insts (1), hs-boot (4), fast2haskell (2),
        jl_defaults (1), jq_readsPrec (1), jules_xref (1), jtod_circint (4),
        jules_xref2 (1), lennart_range (1), lex (1), life_space_leak (1),
        bargon-mangler-bug (7), record_upd (1), rittri (1), sanders_array (1),
        strict_anns (1), thurston-module-arith (2), okeefe_neural (1),
        joao-circular (6), 10queens (1)
      
      Programs with different compiler outputs:
      
      - jl_defaults (1): For some reason GHC HEAD marks a lot of top-level
        `[Int]` closures as CAFFY for no reason. With this patch we no longer
        make them CAFFY and generate less SRT entries. For some reason Main.o
        is slightly larger with this patch (1.3%) and the executable sizes are
        the same. (I'd expect both to be smaller)
      
      - launchbury (1): Same as jl_defaults: top-level `[Int]` closures marked
        as CAFFY for no reason. Similarly `Main.o` is 1.4% larger but the
        executable sizes are the same.
      
      - galois_raytrace (13): Differences are in the Parse module. There are a
        lot, but some of the changes are caused by the fact that for some
        reason (I think a bug) GHC HEAD marks the dictionary for `Functor
        Identity` as CAFFY. Parse.o is 0.4% larger, the executable size is the
        same.
      
      - north_array: We now generate less SRT entries because some of array
        primops used in this program like `NewArrayOp` get eliminated during
        Stg-to-Cmm and turn some CAFFY things into non-CAFFY. Main.o gets 24%
        larger (9224 bytes from 9000 bytes), executable sizes are the same.
      
      - seward-space-leak: Difference in this program is better shown by this
        smaller example:
      
            module Lib where
      
            data CDS
              = Case [CDS] [(Int, CDS)]
              | Call CDS CDS
      
            instance Eq CDS where
              Case sels1 rets1 == Case sels2 rets2 =
                  sels1 == sels2 && rets1 == rets2
              Call a1 b1 == Call a2 b2 =
                  a1 == a2 && b1 == b2
              _ == _ =
                  False
      
         In this program GHC HEAD builds a new SRT for the recursive group of
         `(==)`, `(/=)` and the dictionary closure. Then `/=` points to `==`
         in its SRT field, and `==` uses the SRT object as its SRT. With this
         patch we use the closure for `/=` as the SRT and add `==` there. Then
         `/=` gets an empty SRT field and `==` points to `/=` in its SRT
         field.
      
         This change looks fine to me.
      
         Main.o gets 0.07% larger, executable sizes are identical.
      
      head.hackage
      ------------
      
      head.hackage's CI script builds 428 packages from Hackage using this
      patch with no failures.
      
      Compiler performance
      --------------------
      
      The compiler perf tests report that the compiler allocates slightly more
      (worst case observed so far is 4%). However most programs in the test
      suite are small, single file programs. To benchmark compiler performance
      on something more realistic I build Cabal (the library, 236 modules)
      with different optimisation levels. For the "max residency" row I run
      GHC with `+RTS -s -A100k -i0 -h` for more accurate numbers. Other rows
      are generated with just `-s`. (This is because `-i0` causes running GC
      much more frequently and as a result "bytes copied" gets inflated by
      more than 25x in some cases)
      
      * -O0
      
      |                 | GHC HEAD       | This MR        | Diff   |
      | --------------- | -------------- | -------------- | ------ |
      | Bytes allocated | 54,413,350,872 | 54,701,099,464 | +0.52% |
      | Bytes copied    |  4,926,037,184 |  4,990,638,760 | +1.31% |
      | Max residency   |    421,225,624 |    424,324,264 | +0.73% |
      
      * -O1
      
      |                 | GHC HEAD        | This MR         | Diff   |
      | --------------- | --------------- | --------------- | ------ |
      | Bytes allocated | 245,849,209,992 | 246,562,088,672 | +0.28% |
      | Bytes copied    |  26,943,452,560 |  27,089,972,296 | +0.54% |
      | Max residency   |     982,643,440 |     991,663,432 | +0.91% |
      
      * -O2
      
      |                 | GHC HEAD        | This MR         | Diff   |
      | --------------- | --------------- | --------------- | ------ |
      | Bytes allocated | 291,044,511,408 | 291,863,910,912 | +0.28% |
      | Bytes copied    |  37,044,237,616 |  36,121,690,472 | -2.49% |
      | Max residency   |   1,071,600,328 |   1,086,396,256 | +1.38% |
      
      Extra compiler allocations
      --------------------------
      
      Runtime allocations of programs are as reported above (NoFib section).
      
      The compiler now allocates more than before. Main source of allocation
      in this patch compared to base commit is the new SRT algorithm
      (GHC.Cmm.Info.Build). Below is some of the extra work we do with this
      patch, numbers generated by profiled stage 2 compiler when building a
      pathological case (the test 'ManyConstructors') with '-O2':
      
      - We now sort the final STG for a module, which means traversing the
        entire program, generating free variable set for each top-level
        binding, doing SCC analysis, and re-ordering the program. In
        ManyConstructors this step allocates 97,889,952 bytes.
      
      - We now do SRT analysis on static data, which in a program like
        ManyConstructors causes analysing 10,000 bindings that we would
        previously just skip. This step allocates 70,898,352 bytes.
      
      - We now maintain an SRT map for the entire module as we compile Cmm
        groups:
      
            data ModuleSRTInfo = ModuleSRTInfo
              { ...
              , moduleSRTMap :: SRTMap
              }
      
         (SRTMap is just a strict Map from the 'containers' library)
      
         This map gets an entry for most bindings in a module (exceptions are
         THUNKs and CAFFY static functions). For ManyConstructors this map
         gets 50015 entries.
      
      - Once we're done with code generation we generate a NameSet from SRTMap
        for the non-CAFFY names in the current module. This set gets the same
        number of entries as the SRTMap.
      
      - Finally we update CafInfos in ModDetails for the non-CAFFY Ids, using
        the NameSet generated in the previous step. This usually does the
        least amount of allocation among the work listed here.
      
      Only place with this patch where we do less work in the CAF analysis in
      the tidying pass (CoreTidy). However that doesn't save us much, as the
      pass still needs to traverse the whole program and update IdInfos for
      other reasons. Only thing we don't here do is the `hasCafRefs` pass over
      the RHS of bindings, which is a stateless pass that returns a boolean
      value, so it doesn't allocate much.
      
      (Metric changes blow are all increased allocations)
      
      Metric changes
      --------------
      
      Metric Increase:
          ManyAlternatives
          ManyConstructors
          T13035
          T14683
          T1969
          T9961
      c846618a
  4. 27 Jan, 2020 1 commit
  5. 25 Jan, 2020 1 commit
  6. 31 Dec, 2019 1 commit
  7. 09 Sep, 2019 1 commit
    • Sylvain Henry's avatar
      Module hierarchy: StgToCmm (#13009) · 447864a9
      Sylvain Henry authored
      Add StgToCmm module hierarchy. Platform modules that are used in several
      other places (NCG, LLVM codegen, Cmm transformations) are put into
      GHC.Platform.
      447864a9
  8. 07 Aug, 2019 1 commit
  9. 17 Jul, 2019 1 commit
    • John Ericson's avatar
      Create {Int,Word}32Rep · 0a9b77b8
      John Ericson authored
      This prepares the way for making Int32# and Word32# the actual size they
      claim to be.
      
      Updates binary submodule for (de)serializing the new runtime reps.
      0a9b77b8
  10. 13 Jul, 2019 1 commit
    • Andreas Klebinger's avatar
      Add two CmmSwitch optimizations. · 348cc8eb
      Andreas Klebinger authored
      Move switch expressions into a local variable when generating switches.
      This avoids duplicating the expression if we translate the switch
      to a tree search. This fixes #16933.
      
      Further we now check if all branches of a switch have the same
      destination, replacing the switch with a direct branch if that
      is the case.
      
      Both of these patterns appear in the ENTER macro used by the RTS
      but are unlikely to occur in intermediate Cmm generated by GHC.
      
      Nofib result summary:
      
      --------------------------------------------------------------------------------
              Program           Size    Allocs   Runtime   Elapsed  TotalMem
      --------------------------------------------------------------------------------
                  Min          -0.0%     -0.0%    -15.7%    -15.6%      0.0%
                  Max          -0.0%      0.0%     +5.4%     +5.5%      0.0%
       Geometric Mean          -0.0%     -0.0%     -1.0%     -1.0%     -0.0%
      
      Compiler allocations go up slightly: +0.2%
      
      Example output before and after the change taken from RTS code below.
      
      All but one of the memory loads `I32[_c3::I64 - 8]` are eliminated.
      Instead the data is loaded once from memory in block c6.
      
      Also the switch in block `ud` in the original code has been
      eliminated completely.
      
      Cmm without this commit:
      
      ```
      stg_ap_0_fast() { //  [R1]
              { []
              }
          {offset
            ca: _c1::P64 = R1;   // CmmAssign
                goto c2;   // CmmBranch
            c2: if (_c1::P64 & 7 != 0) goto c4; else goto c6;
            c6: _c3::I64 = I64[_c1::P64];
                if (I32[_c3::I64 - 8] < 26 :: W32) goto ub; else goto ug;
            ub: if (I32[_c3::I64 - 8] < 15 :: W32) goto uc; else goto ue;
            uc: if (I32[_c3::I64 - 8] < 8 :: W32) goto c7; else goto ud;
            ud: switch [8 .. 14] (%MO_SS_Conv_W32_W64(I32[_c3::I64 - 8])) {
                    case 8, 9, 10, 11, 12, 13, 14 : goto c4;
                }
            ue: if (I32[_c3::I64 - 8] >= 25 :: W32) goto c4; else goto uf;
            uf: if (%MO_SS_Conv_W32_W64(I32[_c3::I64 - 8]) != 23) goto c7; else goto c4;
            c4: R1 = _c1::P64;
                call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
            ug: if (I32[_c3::I64 - 8] < 28 :: W32) goto uh; else goto ui;
            uh: if (I32[_c3::I64 - 8] < 27 :: W32) goto c7; else goto c8;
            ui: if (I32[_c3::I64 - 8] < 29 :: W32) goto c8; else goto c7;
            c8: _c1::P64 = P64[_c1::P64 + 8];
                goto c2;
            c7: R1 = _c1::P64;
                call (_c3::I64)(R1) args: 8, res: 0, upd: 8;
          }
      }
      ```
      
      Cmm with this commit:
      
      ```
      stg_ap_0_fast() { //  [R1]
              { []
              }
          {offset
            ca: _c1::P64 = R1;
                goto c2;
            c2: if (_c1::P64 & 7 != 0) goto c4; else goto c6;
            c6: _c3::I64 = I64[_c1::P64];
                _ub::I64 = %MO_SS_Conv_W32_W64(I32[_c3::I64 - 8]);
                if (_ub::I64 < 26) goto uc; else goto uh;
            uc: if (_ub::I64 < 15) goto ud; else goto uf;
            ud: if (_ub::I64 < 8) goto c7; else goto c4;
            uf: if (_ub::I64 >= 25) goto c4; else goto ug;
            ug: if (_ub::I64 != 23) goto c7; else goto c4;
            c4: R1 = _c1::P64;
                call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
            uh: if (_ub::I64 < 28) goto ui; else goto uj;
            ui: if (_ub::I64 < 27) goto c7; else goto c8;
            uj: if (_ub::I64 < 29) goto c8; else goto c7;
            c8: _c1::P64 = P64[_c1::P64 + 8];
                goto c2;
            c7: R1 = _c1::P64;
                call (_c3::I64)(R1) args: 8, res: 0, upd: 8;
          }
      }
      ```
      348cc8eb
  11. 15 Mar, 2019 1 commit
  12. 31 Jan, 2019 1 commit
  13. 17 Nov, 2018 1 commit
  14. 02 Nov, 2018 1 commit
    • Michal Terepeta's avatar
      Add Int8# and Word8# · 2c959a18
      Michal Terepeta authored
      This is the first step of implementing:
      https://github.com/ghc-proposals/ghc-proposals/pull/74
      
      The main highlights/changes:
      
          primops.txt.pp gets two new sections for two new primitive types for
          signed and unsigned 8-bit integers (Int8# and Word8 respectively) along
          with basic arithmetic and comparison operations. PrimRep/RuntimeRep get
          two new constructors for them. All of the primops translate into the
          existing MachOPs.
      
          For CmmCalls the codegen will now zero-extend the values at call
          site (so that they can be moved to the right register) and then truncate
          them back their original width.
      
          x86 native codegen needed some updates, since it wasn't able to deal
          with the new widths, but all the changes are quite localized. LLVM
          backend seems to just work.
      
      This is the second attempt at merging this, after the first attempt in
      D4475 had to be backed out due to regressions on i386.
      
      Bumps binary submodule.
      Signed-off-by: Michal Terepeta's avatarMichal Terepeta <michal.terepeta@gmail.com>
      
      Test Plan: ./validate (on both x86-{32,64})
      
      Reviewers: bgamari, hvr, goldfire, simonmar
      
      Subscribers: rwbarton, carter
      
      Differential Revision: https://phabricator.haskell.org/D5258
      2c959a18
  15. 09 Oct, 2018 1 commit
    • Ben Gamari's avatar
      Revert "Add Int8# and Word8#" · d728c3c5
      Ben Gamari authored
      This unfortunately broke i386 support since it introduced references to
      byte-sized registers that don't exist on that architecture.
      
      Reverts binary submodule
      
      This reverts commit 5d5307f9.
      d728c3c5
  16. 07 Oct, 2018 1 commit
    • Michal Terepeta's avatar
      Add Int8# and Word8# · 5d5307f9
      Michal Terepeta authored
      This is the first step of implementing:
      https://github.com/ghc-proposals/ghc-proposals/pull/74
      
      The main highlights/changes:
      
      - `primops.txt.pp` gets two new sections for two new primitive types
        for signed and unsigned 8-bit integers (`Int8#` and `Word8`
        respectively) along with basic arithmetic and comparison
        operations. `PrimRep`/`RuntimeRep` get two new constructors for
        them. All of the primops translate into the existing `MachOP`s.
      
      - For `CmmCall`s the codegen will now zero-extend the values at call
        site (so that they can be moved to the right register) and then
        truncate them back their original width.
      
      - x86 native codegen needed some updates, since it wasn't able to deal
        with the new widths, but all the changes are quite localized. LLVM
        backend seems to just work.
      
      Bumps binary submodule.
      Signed-off-by: Michal Terepeta's avatarMichal Terepeta <michal.terepeta@gmail.com>
      
      Test Plan: ./validate with new tests
      
      Reviewers: hvr, goldfire, bgamari, simonmar
      
      Subscribers: Abhiroop, dfeuer, rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4475
      5d5307f9
  17. 12 Jul, 2018 1 commit
  18. 16 May, 2018 1 commit
    • Simon Marlow's avatar
      Allow CmmLabelDiffOff with different widths · fbd28e2c
      Simon Marlow authored
      Summary:
      This change makes it possible to generate a static 32-bit relative label
      offset on x86_64. Currently we can only generate word-sized label
      offsets.
      
      This will be used in D4634 to shrink info tables.  See D4632 for more
      details.
      
      Test Plan: See D4632
      
      Reviewers: bgamari, niteria, michalt, erikd, jrtc27, osa1
      
      Subscribers: thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4633
      fbd28e2c
  19. 19 Mar, 2018 3 commits
    • Michal Terepeta's avatar
      CmmUtils: get rid of insertBlock · 256577fb
      Michal Terepeta authored
      `Hoopl.Graph` has almost exactly the same function, so let's use that.
      Also, use `IntMap.alter` to make it more efficient.
      
      Also switch `Hoopl` to use strict maps.
      Signed-off-by: Michal Terepeta's avatarMichal Terepeta <michal.terepeta@gmail.com>
      
      Test Plan: ./validate
      
      Reviewers: bgamari, simonmar
      
      Reviewed By: bgamari
      
      Subscribers: dfeuer, rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4493
      256577fb
    • Michal Terepeta's avatar
      Hoopl: improve postorder calculation · bbcea13a
      Michal Terepeta authored
      - Fix the naming and comments to indicate that we are calculating
        *reverse* postorder (and not the standard postorder).
      
      - Rewrite the calculation to avoid CPS code. I found it fairly
        difficult to understand and the new one seems faster (according to
        nofib, decreases compiler allocations by 0.2%)
      
      - Remove `LabelsPtr`, which seems unnecessary and could be *really*
        confusing. For instance, previously:
        `postorder_dfs_from <block with label X>`
        and
        `postorder_dfs_from <label X>`
        would actually mean quite different things (and give different
        results).
      
      - Change the `Dataflow` module to always use entry of the graph for
        reverse postorder calculation. This should be the only change in
        behavior of this commit.
      
        Previously, if the caller provided initial facts for some of the
        labels, we would use those labels for our postorder calculation.
        However, I don't think that's correct in general - if the initial
        facts did not contain the entry of the graph, we would never analyze
        the blocks reachable from the entry but unreachable from the labels
        provided with the initial facts. It seems that the only analysis that
        used this was proc-point analysis, which I think would always include
        the entry block (so I don't think there's any bug due to this).
      Signed-off-by: Michal Terepeta's avatarMichal Terepeta <michal.terepeta@gmail.com>
      
      Test Plan: ./validate
      
      Reviewers: bgamari, simonmar
      
      Reviewed By: simonmar
      
      Subscribers: rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4464
      bbcea13a
    • Simon Marlow's avatar
      Be more selective in which conditionals we invert · 39c74063
      Simon Marlow authored
      Test Plan: validate
      
      Reviewers: bgamari, AndreasK, erikd
      
      Reviewed By: AndreasK
      
      Subscribers: rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4398
      39c74063
  20. 18 Feb, 2018 1 commit
  21. 02 Feb, 2018 1 commit
    • Michal Terepeta's avatar
      Hoopl.Collections: change right folds to strict left folds · 2974b2b8
      Michal Terepeta authored
      It seems that most uses of these folds should be strict left folds
      (I could only find a single place that benefits from a right fold).
      So this removes the existing `setFold`/`mapFold`/`mapFoldWihKey`
      replaces them with:
      - `setFoldl`/`mapFoldl`/`mapFoldlWithKey` (strict left folds)
      - `setFoldr`/`mapFoldr` (for the less common case where a right fold
        actually makes sense, e.g., `CmmProcPoint`)
      Signed-off-by: Michal Terepeta's avatarMichal Terepeta <michal.terepeta@gmail.com>
      
      Test Plan: ./validate
      
      Reviewers: bgamari, simonmar
      
      Reviewed By: bgamari
      
      Subscribers: rwbarton, thomie, carter, kavon
      
      Differential Revision: https://phabricator.haskell.org/D4356
      2974b2b8
  22. 19 Sep, 2017 1 commit
    • Herbert Valerio Riedel's avatar
      compiler: introduce custom "GhcPrelude" Prelude · f63bc730
      Herbert Valerio Riedel authored
      This switches the compiler/ component to get compiled with
      -XNoImplicitPrelude and a `import GhcPrelude` is inserted in all
      modules.
      
      This is motivated by the upcoming "Prelude" re-export of
      `Semigroup((<>))` which would cause lots of name clashes in every
      modulewhich imports also `Outputable`
      
      Reviewers: austin, goldfire, bgamari, alanz, simonmar
      
      Reviewed By: bgamari
      
      Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari
      
      Differential Revision: https://phabricator.haskell.org/D3989
      f63bc730
  23. 06 Sep, 2017 1 commit
  24. 23 Jun, 2017 1 commit
    • Michal Terepeta's avatar
      Hoopl: remove dependency on Hoopl package · 42eee6ea
      Michal Terepeta authored
      This copies the subset of Hoopl's functionality needed by GHC to
      `cmm/Hoopl` and removes the dependency on the Hoopl package.
      
      The main motivation for this change is the confusing/noisy interface
      between GHC and Hoopl:
      - Hoopl has `Label` which is GHC's `BlockId` but different than
        GHC's `CLabel`
      - Hoopl has `Unique` which is different than GHC's `Unique`
      - Hoopl has `Unique{Map,Set}` which are different than GHC's
        `Uniq{FM,Set}`
      - GHC has its own specialized copy of `Dataflow`, so `cmm/Hoopl` is
        needed just to filter the exposed functions (filter out some of the
        Hoopl's and add the GHC ones)
      With this change, we'll be able to simplify this significantly.
      It'll also be much easier to do invasive changes (Hoopl is a public
      package on Hackage with users that depend on the current behavior)
      
      This should introduce no changes in functionality - it merely
      copies the relevant code.
      Signed-off-by: Michal Terepeta's avatarMichal Terepeta <michal.terepeta@gmail.com>
      
      Test Plan: ./validate
      
      Reviewers: austin, bgamari, simonmar
      
      Reviewed By: bgamari, simonmar
      
      Subscribers: simonpj, kavon, rwbarton, thomie
      
      Differential Revision: https://phabricator.haskell.org/D3616
      42eee6ea
  25. 23 May, 2017 1 commit
  26. 20 Jan, 2017 1 commit
    • takano-akio's avatar
      Allow top-level string literals in Core (#8472) · d49b2bb2
      takano-akio authored
      This commits relaxes the invariants of the Core syntax so that a
      top-level variable can be bound to a primitive string literal of type
      Addr#.
      
      This commit:
      
      * Relaxes the invatiants of the Core, and allows top-level bindings whose
        type is Addr# as long as their RHS is either a primitive string literal or
        another variable.
      
      * Allows the simplifier and the full-laziness transformer to float out
        primitive string literals to the top leve.
      
      * Introduces the new StgGenTopBinding type to accomodate top-level Addr#
        bindings.
      
      * Introduces a new type of labels in the object code, with the suffix "_bytes",
        for exported top-level Addr# bindings.
      
      * Makes some built-in rules more robust. This was necessary to keep them
        functional after the above changes.
      
      This is a continuation of D2554.
      
      Rebasing notes:
      This had two slightly suspicious performance regressions:
      
      * T12425: bytes allocated regressed by roughly 5%
      * T4029: bytes allocated regressed by a bit over 1%
      * T13035: bytes allocated regressed by a bit over 5%
      
      These deserve additional investigation.
      
      Rebased by: bgamari.
      
      Test Plan: ./validate --slow
      
      Reviewers: goldfire, trofi, simonmar, simonpj, austin, hvr, bgamari
      
      Reviewed By: trofi, simonpj, bgamari
      
      Subscribers: trofi, simonpj, gridaphobe, thomie
      
      Differential Revision: https://phabricator.haskell.org/D2605
      
      GHC Trac Issues: #8472
      d49b2bb2
  27. 19 Jan, 2017 1 commit
    • Richard Eisenberg's avatar
      Update levity polymorphism · e7985ed2
      Richard Eisenberg authored
      This commit implements the proposal in
      https://github.com/ghc-proposals/ghc-proposals/pull/29 and
      https://github.com/ghc-proposals/ghc-proposals/pull/35.
      
      Here are some of the pieces of that proposal:
      
      * Some of RuntimeRep's constructors have been shortened.
      
      * TupleRep and SumRep are now parameterized over a list of RuntimeReps.
      * This
      means that two types with the same kind surely have the same
      representation.
      Previously, all unboxed tuples had the same kind, and thus the fact
      above was
      false.
      
      * RepType.typePrimRep and friends now return a *list* of PrimReps. These
      functions can now work successfully on unboxed tuples. This change is
      necessary because we allow abstraction over unboxed tuple types and so
      cannot
      always handle unboxed tuples specially as we did before.
      
      * We sometimes have to create an Id from a PrimRep. I thus split PtrRep
      * into
      LiftedRep and UnliftedRep, so that the created Ids have the right
      strictness.
      
      * The RepType.RepType type was removed, as it didn't seem to help with
      * much.
      
      * The RepType.repType function is also removed, in favor of typePrimRep.
      
      * I have waffled a good deal on whether or not to keep VoidRep in
      TyCon.PrimRep. In the end, I decided to keep it there. PrimRep is *not*
      represented in RuntimeRep, and typePrimRep will never return a list
      including
      VoidRep. But it's handy to have in, e.g., ByteCodeGen and friends. I can
      imagine another design choice where we have a PrimRepV type that is
      PrimRep
      with an extra constructor. That seemed to be a heavier design, though,
      and I'm
      not sure what the benefit would be.
      
      * The last, unused vestiges of # (unliftedTypeKind) have been removed.
      
      * There were several pretty-printing bugs that this change exposed;
      * these are fixed.
      
      * We previously checked for levity polymorphism in the types of binders.
      * But we
      also must exclude levity polymorphism in function arguments. This is
      hard to check
      for, requiring a good deal of care in the desugarer. See Note [Levity
      polymorphism
      checking] in DsMonad.
      
      * In order to efficiently check for levity polymorphism in functions, it
      * was necessary
      to add a new bit of IdInfo. See Note [Levity info] in IdInfo.
      
      * It is now safe for unlifted types to be unsaturated in Core. Core Lint
      * is updated
      accordingly.
      
      * We can only know strictness after zonking, so several checks around
      * strictness
      in the type-checker (checkStrictBinds, the check for unlifted variables
      under a ~
      pattern) have been moved to the desugarer.
      
      * Along the way, I improved the treatment of unlifted vs. banged
      * bindings. See
      Note [Strict binds checks] in DsBinds and #13075.
      
      * Now that we print type-checked source, we must be careful to print
      * ConLikes correctly.
      This is facilitated by a new HsConLikeOut constructor to HsExpr.
      Particularly troublesome
      are unlifted pattern synonyms that get an extra void# argument.
      
      * Includes a submodule update for haddock, getting rid of #.
      
      * New testcases:
        typecheck/should_fail/StrictBinds
        typecheck/should_fail/T12973
        typecheck/should_run/StrictPats
        typecheck/should_run/T12809
        typecheck/should_fail/T13105
        patsyn/should_fail/UnliftedPSBind
        typecheck/should_fail/LevPolyBounded
        typecheck/should_compile/T12987
        typecheck/should_compile/T11736
      
      * Fixed tickets:
        #12809
        #12973
        #11736
        #13075
        #12987
      
      * This also adds a test case for #13105. This test case is
      * "compile_fail" and
      succeeds, because I want the testsuite to monitor the error message.
      When #13105 is fixed, the test case will compile cleanly.
      e7985ed2
  28. 08 Dec, 2016 1 commit
  29. 06 Dec, 2016 1 commit
  30. 26 Oct, 2016 1 commit
  31. 19 Oct, 2016 1 commit
  32. 10 Aug, 2016 1 commit
    • Ömer Sinan Ağacan's avatar
      Remove StgRubbishArg and CmmArg · 9684dbb1
      Ömer Sinan Ağacan authored
      The idea behind adding special "rubbish" arguments was in unboxed sum types
      depending on the tag some arguments are not used and we don't want to move some
      special values (like 0 for literals and some special pointer for boxed slots)
      for those arguments (to stack locations or registers). "StgRubbishArg" was an
      indicator to the code generator that the value won't be used. During Stg-to-Cmm
      we were then not generating any move or store instructions at all.
      
      This caused problems in the register allocator because some variables were only
      initialized in some code paths. As an example, suppose we have this STG: (after
      unarise)
      
          Lib.$WT =
              \r [dt_sit]
                  case
                      case dt_sit of {
                        Lib.F dt_siv [Occ=Once] ->
                            (#,,#) [1# dt_siv StgRubbishArg::GHC.Prim.Int#];
                        Lib.I dt_siw [Occ=Once] ->
                            (#,,#) [2# StgRubbishArg::GHC.Types.Any dt_siw];
                      }
                  of
                  dt_six
                  { (#,,#) us_giC us_giD us_giE -> Lib.T [us_giC us_giD us_giE];
                  };
      
      This basically unpacks a sum type to an unboxed sum with 3 fields, and then
      moves the unboxed sum to a constructor (`Lib.T`).
      
      This is the Cmm for the inner case expression (case expression in the scrutinee
      position of the outer case):
      
          ciN:
              ...
              -- look at dt_sit's tag
              if (_ciT::P64 != 1) goto ciS; else goto ciR;
          ciS: -- Tag is 2, i.e. Lib.F
              _siw::I64 = I64[_siu::P64 + 6];
              _giE::I64 = _siw::I64;
              _giD::P64 = stg_RUBBISH_ENTRY_info;
              _giC::I64 = 2;
              goto ciU;
          ciR: -- Tag is 1, i.e. Lib.I
              _siv::P64 = P64[_siu::P64 + 7];
              _giD::P64 = _siv::P64;
              _giC::I64 = 1;
              goto ciU;
      
      Here one of the blocks `ciS` and `ciR` is executed and then the execution
      continues to `ciR`, but only `ciS` initializes `_giE`, in the other branch
      `_giE` is not initialized, because it's "rubbish" in the STG and so we don't
      generate an assignment during code generator. The code generator then panics
      during the register allocations:
      
          ghc-stage1: panic! (the 'impossible' happened)
            (GHC version 8.1.20160722 for x86_64-unknown-linux):
                  LocalReg's live-in to graph ciY {_giE::I64}
      
      (`_giD` is also "rubbish" in `ciS`, but it's still initialized because it's a
      pointer slot, we have to initialize it otherwise garbage collector follows the
      pointer to some random place. So we only remove assignment if the "rubbish" arg
      has unboxed type.)
      
      This patch removes `StgRubbishArg` and `CmmArg`. We now always initialize
      rubbish slots. If the slot is for boxed types we use the existing `absentError`,
      otherwise we initialize the slot with literal 0.
      
      Reviewers: simonpj, erikd, austin, simonmar, bgamari
      
      Reviewed By: erikd
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2446
      9684dbb1
  33. 21 Jul, 2016 1 commit
    • Ömer Sinan Ağacan's avatar
      Implement unboxed sum primitive type · 714bebff
      Ömer Sinan Ağacan authored
      Summary:
      This patch implements primitive unboxed sum types, as described in
      https://ghc.haskell.org/trac/ghc/wiki/UnpackedSumTypes.
      
      Main changes are:
      
      - Add new syntax for unboxed sums types, terms and patterns. Hidden
        behind `-XUnboxedSums`.
      
      - Add unlifted unboxed sum type constructors and data constructors,
        extend type and pattern checkers and desugarer.
      
      - Add new RuntimeRep for unboxed sums.
      
      - Extend unarise pass to translate unboxed sums to unboxed tuples right
        before code generation.
      
      - Add `StgRubbishArg` to `StgArg`, and a new type `CmmArg` for better
        code generation when sum values are involved.
      
      - Add user manual section for unboxed sums.
      
      Some other changes:
      
      - Generalize `UbxTupleRep` to `MultiRep` and `UbxTupAlt` to
        `MultiValAlt` to be able to use those with both sums and tuples.
      
      - Don't use `tyConPrimRep` in `isVoidTy`: `tyConPrimRep` is really
        wrong, given an `Any` `TyCon`, there's no way to tell what its kind
        is, but `kindPrimRep` and in turn `tyConPrimRep` returns `PtrRep`.
      
      - Fix some bugs on the way: #12375.
      
      Not included in this patch:
      
      - Update Haddock for new the new unboxed sum syntax.
      
      - `TemplateHaskell` support is left as future work.
      
      For reviewers:
      
      - Front-end code is mostly trivial and adapted from unboxed tuple code
        for type checking, pattern checking, renaming, desugaring etc.
      
      - Main translation routines are in `RepType` and `UnariseStg`.
        Documentation in `UnariseStg` should be enough for understanding
        what's going on.
      
      Credits:
      
      - Johan Tibell wrote the initial front-end and interface file
        extensions.
      
      - Simon Peyton Jones reviewed this patch many times, wrote some code,
        and helped with debugging.
      
      Reviewers: bgamari, alanz, goldfire, RyanGlScott, simonpj, austin,
                 simonmar, hvr, erikd
      
      Reviewed By: simonpj
      
      Subscribers: Iceland_jack, ggreif, ezyang, RyanGlScott, goldfire,
                   thomie, mpickering
      
      Differential Revision: https://phabricator.haskell.org/D2259
      714bebff
  34. 12 Nov, 2015 1 commit
    • olsner's avatar
      Implement function-sections for Haskell code, #8405 · 4a32bf92
      olsner authored
      This adds a flag -split-sections that does similar things to
      -split-objs, but using sections in single object files instead of
      relying on the Satanic Splitter and other abominations. This is very
      similar to the GCC flags -ffunction-sections and -fdata-sections.
      
      The --gc-sections linker flag, which allows unused sections to actually
      be removed, is added to all link commands (if the linker supports it) so
      that space savings from having base compiled with sections can be
      realized.
      
      Supported both in LLVM and the native code-gen, in theory for all
      architectures, but really tested on x86 only.
      
      In the GHC build, a new SplitSections variable enables -split-sections
      for relevant parts of the build.
      
      Test Plan: validate with both settings of SplitSections
      
      Reviewers: dterei, Phyx, austin, simonmar, thomie, bgamari
      
      Reviewed By: simonmar, thomie, bgamari
      
      Subscribers: hsyl20, erikd, kgardas, thomie
      
      Differential Revision: https://phabricator.haskell.org/D1242
      
      GHC Trac Issues: #8405
      4a32bf92
  35. 25 Jun, 2015 1 commit
    • rwbarton's avatar
      Be aware of overlapping global STG registers in CmmSink (#10521) · a2f828a3
      rwbarton authored
      Summary:
      On x86_64, commit e2f6bbd3 assigned
      the STG registers F1 and D1 the same hardware register (xmm1), and
      the same for the registers F2 and D2, etc. When mixing calls to
      functions involving Float#s and Double#s, this can cause wrong Cmm
      optimizations that assume the F1 and D1 registers are independent.
      
      Reviewers: simonpj, austin
      
      Reviewed By: austin
      
      Subscribers: simonpj, thomie, bgamari
      
      Differential Revision: https://phabricator.haskell.org/D993
      
      GHC Trac Issues: #10521
      a2f828a3
  36. 12 Jun, 2015 1 commit
  37. 30 Mar, 2015 1 commit
    • Joachim Breitner's avatar
      Refactor the story around switches (#10137) · de1160be
      Joachim Breitner authored
      This re-implements the code generation for case expressions at the Stg →
      Cmm level, both for data type cases as well as for integral literal
      cases. (Cases on float are still treated as before).
      
      The goal is to allow for fancier strategies in implementing them, for a
      cleaner separation of the strategy from the gritty details of Cmm, and
      to run this later than the Common Block Optimization, allowing for one
      way to attack #10124. The new module CmmSwitch contains a number of
      notes explaining this changes. For example, it creates larger
      consecutive jump tables than the previous code, if possible.
      
      nofib shows little significant overall improvement of runtime. The
      rather large wobbling comes from changes in the code block order
      (see #8082, not much we can do about it). But the decrease in code size
      alone makes this worthwhile.
      
      ```
              Program           Size    Allocs   Runtime   Elapsed  TotalMem
                  Min          -1.8%      0.0%     -6.1%     -6.1%     -2.9%
                  Max          -0.7%     +0.0%     +5.6%     +5.7%     +7.8%
       Geometric Mean          -1.4%     -0.0%     -0.3%     -0.3%     +0.0%
      ```
      
      Compilation time increases slightly:
      ```
              -1 s.d.                -----            -2.0%
              +1 s.d.                -----            +2.5%
              Average                -----            +0.3%
      ```
      
      The test case T783 regresses a lot, but it is the only one exhibiting
      any regression. The cause is the changed order of branches in an
      if-then-else tree, which makes the hoople data flow analysis traverse
      the blocks in a suboptimal order. Reverting that gets rid of this
      regression, but has a consistent, if only very small (+0.2%), negative
      effect on runtime. So I conclude that this test is an extreme outlier
      and no reason to change the code.
      
      Differential Revision: https://phabricator.haskell.org/D720
      de1160be
  38. 09 Mar, 2015 1 commit