1. 19 Aug, 2020 1 commit
    • Sylvain Henry's avatar
      DynFlags: refactor GHC.CmmToAsm (#17957, #10143) · 0c5ed5c7
      Sylvain Henry authored
      This patch removes the use of `sdocWithDynFlags` from GHC.CmmToAsm.*.Ppr
      
      To do that I've had to make some refactoring:
      
      * X86' and PPC's `Instr` are no longer `Outputable` as they require a
        `Platform` argument
      
      * `Instruction` class now exposes `pprInstr :: Platform -> instr -> SDoc`
      
      * as a consequence, I've refactored some modules to avoid .hs-boot files
      
      * added (derived) functor instances for some datatypes parametric in the
        instruction type. It's useful for pretty-printing as we just have to
        map `pprInstr` before pretty-printing the container datatype.
      0c5ed5c7
  2. 21 May, 2020 1 commit
    • Andreas Klebinger's avatar
      Refactor linear reg alloc to remember past assignments. · 13f6c9d0
      Andreas Klebinger authored
      When assigning registers we now first try registers we
      assigned to in the past, instead of picking the "first"
      one.
      
      This is in extremely helpful when dealing with loops for
      which variables are dead for part of the loop.
      
      This is important for patterns like this:
      
              foo = arg1
          loop:
              use(foo)
              ...
              foo = getVal()
              goto loop;
      
      There we:
      * assign foo to the register of arg1.
      * use foo, it's dead after this use as it's overwritten after.
      * do other things.
      * look for a register to put foo in.
      
      If we pick an arbitrary one it might differ from the register the
      start of the loop expect's foo to be in.
      To fix this we simply look for past register assignments for
      the given variable. If we find one and the register is free we
      use that register.
      
      This reduces the need for fixup blocks which match the register
      assignment between blocks. In the example above between the end
      and the head of the loop.
      
      This patch also moves branch weight estimation ahead of register
      allocation and adds a flag to control it (cmm-static-pred).
      * It means the linear allocator is more likely to assign the hotter
        code paths first.
      * If it assign these first we are:
        + Less likely to spill on the hot path.
        + Less likely to introduce fixup blocks on the hot path.
      
      These two measure combined are surprisingly effective. Based on nofib
      we get in the mean:
      
      * -0.9% instructions executed
      * -0.1% reads/writes
      * -0.2% code size.
      * -0.1% compiler allocations.
      * -0.9% compile time.
      * -0.8% runtime.
      
      Most of the benefits are simply a result of removing redundant moves
      and spills.
      
      Reduced compiler allocations likely are the result of less code being
      generated. (The added lookup is mostly non-allocating).
      13f6c9d0
  3. 26 Apr, 2020 1 commit
  4. 15 Mar, 2020 1 commit
    • Sylvain Henry's avatar
      Refactor CmmToAsm (disentangle DynFlags) · 2e82465f
      Sylvain Henry authored
      This patch disentangles a bit more DynFlags from the native code
      generator (CmmToAsm).
      
      In more details:
      
      - add a new NCGConfig datatype in GHC.CmmToAsm.Config which contains the
        configuration of a native code generation session
      - explicitly pass NCGConfig/Platform arguments when necessary
      - as a consequence `sdocWithPlatform` is gone and there are only a few
        `sdocWithDynFlags` left
      - remove the use of `unsafeGlobalDynFlags` from GHC.CmmToAsm.CFG
      - remove `sdocDebugLevel` (now we pass the debug level via NCGConfig)
      
      There are still some places where DynFlags is used, especially because
      of pretty-printing (CLabel), because of Cmm helpers (such as
      `cmmExprType`) and because of `Outputable` instance for the
      instructions. These are left for future refactoring as this patch is
      already big.
      2e82465f
  5. 25 Feb, 2020 1 commit
  6. 22 Feb, 2020 1 commit
  7. 31 Jan, 2020 1 commit
    • Ömer Sinan Ağacan's avatar
      Do CafInfo/SRT analysis in Cmm · c846618a
      Ömer Sinan Ağacan authored
      This patch removes all CafInfo predictions and various hacks to preserve
      predicted CafInfos from the compiler and assigns final CafInfos to
      interface Ids after code generation. SRT analysis is extended to support
      static data, and Cmm generator is modified to allow generating
      static_link fields after SRT analysis.
      
      This also fixes `-fcatch-bottoms`, which introduces error calls in case
      expressions in CorePrep, which runs *after* CoreTidy (which is where we
      decide on CafInfos) and turns previously non-CAFFY things into CAFFY.
      
      Fixes #17648
      Fixes #9718
      
      Evaluation
      ==========
      
      NoFib
      -----
      
      Boot with: `make boot mode=fast`
      Run: `make mode=fast EXTRA_RUNTEST_OPTS="-cachegrind" NoFibRuns=1`
      
      --------------------------------------------------------------------------------
              Program           Size    Allocs    Instrs     Reads    Writes
      --------------------------------------------------------------------------------
                   CS          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  CSD          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                   FS          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                    S          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                   VS          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  VSD          -0.0%      0.0%     -0.0%     -0.0%     -0.5%
                  VSM          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 anna          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
                 ansi          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 atom          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               awards          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               banner          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           bernouilli          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         binary-trees          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                boyer          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               boyer2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 bspt          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            cacheprof          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             calendar          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             cichelli          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              circsim          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             clausify          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
        comp_lab_zift          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             compress          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            compress2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
          constraints          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         cryptarithm1          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         cryptarithm2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  cse          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         digits-of-e1          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         digits-of-e2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               dom-lt          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                eliza          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                event          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
          exact-reals          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               exp3_8          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               expert          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
       fannkuch-redux          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                fasta          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  fem          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  fft          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 fft2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             fibheaps          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 fish          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                fluid          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
               fulsom          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               gamteb          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  gcd          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
          gen_regexps          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               genfft          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                   gg          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 grep          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               hidden          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  hpg          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
                  ida          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                infer          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              integer          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            integrate          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         k-nucleotide          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                kahan          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              knights          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               lambda          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           last-piece          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 lcss          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 life          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 lift          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               linear          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
            listcompr          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             listcopy          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             maillist          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               mandel          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              mandel2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 mate          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              minimax          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              mkhprog          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           multiplier          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               n-body          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             nucleic2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 para          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            paraffins          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               parser          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
              parstof          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
                  pic          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             pidigits          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                power          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               pretty          -0.0%      0.0%     -0.3%     -0.4%     -0.4%
               primes          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            primetest          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               prolog          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               puzzle          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               queens          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              reptile          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
      reverse-complem          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              rewrite          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 rfib          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  rsa          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  scc          -0.0%      0.0%     -0.3%     -0.5%     -0.4%
                sched          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  scs          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               simple          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
                solid          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              sorting          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
        spectral-norm          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               sphere          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               symalg          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  tak          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            transform          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             treejoin          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            typecheck          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              veritas          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 wang          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            wave4main          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         wheel-sieve1          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         wheel-sieve2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 x2n1          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
      --------------------------------------------------------------------------------
                  Min          -0.1%      0.0%     -0.3%     -0.5%     -0.5%
                  Max          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
       Geometric Mean          -0.0%     -0.0%     -0.0%     -0.0%     -0.0%
      
      --------------------------------------------------------------------------------
              Program           Size    Allocs    Instrs     Reads    Writes
      --------------------------------------------------------------------------------
              circsim          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
          constraints          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             fibheaps          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             gc_bench          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 hash          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 lcss          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                power          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           spellcheck          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
      --------------------------------------------------------------------------------
                  Min          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
                  Max          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
       Geometric Mean          -0.0%     +0.0%     -0.0%     -0.0%     -0.0%
      
      Manual inspection of programs in testsuite/tests/programs
      ---------------------------------------------------------
      
      I built these programs with a bunch of dump flags and `-O` and compared
      STG, Cmm, and Asm dumps and file sizes.
      
      (Below the numbers in parenthesis show number of modules in the program)
      
      These programs have identical compiler (same .hi and .o sizes, STG, and
      Cmm and Asm dumps):
      
      - Queens (1), andre_monad (1), cholewo-eval (2), cvh_unboxing (3),
        andy_cherry (7), fun_insts (1), hs-boot (4), fast2haskell (2),
        jl_defaults (1), jq_readsPrec (1), jules_xref (1), jtod_circint (4),
        jules_xref2 (1), lennart_range (1), lex (1), life_space_leak (1),
        bargon-mangler-bug (7), record_upd (1), rittri (1), sanders_array (1),
        strict_anns (1), thurston-module-arith (2), okeefe_neural (1),
        joao-circular (6), 10queens (1)
      
      Programs with different compiler outputs:
      
      - jl_defaults (1): For some reason GHC HEAD marks a lot of top-level
        `[Int]` closures as CAFFY for no reason. With this patch we no longer
        make them CAFFY and generate less SRT entries. For some reason Main.o
        is slightly larger with this patch (1.3%) and the executable sizes are
        the same. (I'd expect both to be smaller)
      
      - launchbury (1): Same as jl_defaults: top-level `[Int]` closures marked
        as CAFFY for no reason. Similarly `Main.o` is 1.4% larger but the
        executable sizes are the same.
      
      - galois_raytrace (13): Differences are in the Parse module. There are a
        lot, but some of the changes are caused by the fact that for some
        reason (I think a bug) GHC HEAD marks the dictionary for `Functor
        Identity` as CAFFY. Parse.o is 0.4% larger, the executable size is the
        same.
      
      - north_array: We now generate less SRT entries because some of array
        primops used in this program like `NewArrayOp` get eliminated during
        Stg-to-Cmm and turn some CAFFY things into non-CAFFY. Main.o gets 24%
        larger (9224 bytes from 9000 bytes), executable sizes are the same.
      
      - seward-space-leak: Difference in this program is better shown by this
        smaller example:
      
            module Lib where
      
            data CDS
              = Case [CDS] [(Int, CDS)]
              | Call CDS CDS
      
            instance Eq CDS where
              Case sels1 rets1 == Case sels2 rets2 =
                  sels1 == sels2 && rets1 == rets2
              Call a1 b1 == Call a2 b2 =
                  a1 == a2 && b1 == b2
              _ == _ =
                  False
      
         In this program GHC HEAD builds a new SRT for the recursive group of
         `(==)`, `(/=)` and the dictionary closure. Then `/=` points to `==`
         in its SRT field, and `==` uses the SRT object as its SRT. With this
         patch we use the closure for `/=` as the SRT and add `==` there. Then
         `/=` gets an empty SRT field and `==` points to `/=` in its SRT
         field.
      
         This change looks fine to me.
      
         Main.o gets 0.07% larger, executable sizes are identical.
      
      head.hackage
      ------------
      
      head.hackage's CI script builds 428 packages from Hackage using this
      patch with no failures.
      
      Compiler performance
      --------------------
      
      The compiler perf tests report that the compiler allocates slightly more
      (worst case observed so far is 4%). However most programs in the test
      suite are small, single file programs. To benchmark compiler performance
      on something more realistic I build Cabal (the library, 236 modules)
      with different optimisation levels. For the "max residency" row I run
      GHC with `+RTS -s -A100k -i0 -h` for more accurate numbers. Other rows
      are generated with just `-s`. (This is because `-i0` causes running GC
      much more frequently and as a result "bytes copied" gets inflated by
      more than 25x in some cases)
      
      * -O0
      
      |                 | GHC HEAD       | This MR        | Diff   |
      | --------------- | -------------- | -------------- | ------ |
      | Bytes allocated | 54,413,350,872 | 54,701,099,464 | +0.52% |
      | Bytes copied    |  4,926,037,184 |  4,990,638,760 | +1.31% |
      | Max residency   |    421,225,624 |    424,324,264 | +0.73% |
      
      * -O1
      
      |                 | GHC HEAD        | This MR         | Diff   |
      | --------------- | --------------- | --------------- | ------ |
      | Bytes allocated | 245,849,209,992 | 246,562,088,672 | +0.28% |
      | Bytes copied    |  26,943,452,560 |  27,089,972,296 | +0.54% |
      | Max residency   |     982,643,440 |     991,663,432 | +0.91% |
      
      * -O2
      
      |                 | GHC HEAD        | This MR         | Diff   |
      | --------------- | --------------- | --------------- | ------ |
      | Bytes allocated | 291,044,511,408 | 291,863,910,912 | +0.28% |
      | Bytes copied    |  37,044,237,616 |  36,121,690,472 | -2.49% |
      | Max residency   |   1,071,600,328 |   1,086,396,256 | +1.38% |
      
      Extra compiler allocations
      --------------------------
      
      Runtime allocations of programs are as reported above (NoFib section).
      
      The compiler now allocates more than before. Main source of allocation
      in this patch compared to base commit is the new SRT algorithm
      (GHC.Cmm.Info.Build). Below is some of the extra work we do with this
      patch, numbers generated by profiled stage 2 compiler when building a
      pathological case (the test 'ManyConstructors') with '-O2':
      
      - We now sort the final STG for a module, which means traversing the
        entire program, generating free variable set for each top-level
        binding, doing SCC analysis, and re-ordering the program. In
        ManyConstructors this step allocates 97,889,952 bytes.
      
      - We now do SRT analysis on static data, which in a program like
        ManyConstructors causes analysing 10,000 bindings that we would
        previously just skip. This step allocates 70,898,352 bytes.
      
      - We now maintain an SRT map for the entire module as we compile Cmm
        groups:
      
            data ModuleSRTInfo = ModuleSRTInfo
              { ...
              , moduleSRTMap :: SRTMap
              }
      
         (SRTMap is just a strict Map from the 'containers' library)
      
         This map gets an entry for most bindings in a module (exceptions are
         THUNKs and CAFFY static functions). For ManyConstructors this map
         gets 50015 entries.
      
      - Once we're done with code generation we generate a NameSet from SRTMap
        for the non-CAFFY names in the current module. This set gets the same
        number of entries as the SRTMap.
      
      - Finally we update CafInfos in ModDetails for the non-CAFFY Ids, using
        the NameSet generated in the previous step. This usually does the
        least amount of allocation among the work listed here.
      
      Only place with this patch where we do less work in the CAF analysis in
      the tidying pass (CoreTidy). However that doesn't save us much, as the
      pass still needs to traverse the whole program and update IdInfos for
      other reasons. Only thing we don't here do is the `hasCafRefs` pass over
      the RHS of bindings, which is a stateless pass that returns a boolean
      value, so it doesn't allocate much.
      
      (Metric changes blow are all increased allocations)
      
      Metric changes
      --------------
      
      Metric Increase:
          ManyAlternatives
          ManyConstructors
          T13035
          T14683
          T1969
          T9961
      c846618a
  8. 25 Jan, 2020 1 commit
  9. 20 Jun, 2019 1 commit
    • John Ericson's avatar
      Move 'Platform' to ghc-boot · bff2f24b
      John Ericson authored
      ghc-pkg needs to be aware of platforms so it can figure out which
      subdire within the user package db to use. This is admittedly
      roundabout, but maybe Cabal could use the same notion of a platform as
      GHC to good affect too.
      bff2f24b
  10. 18 Jul, 2018 1 commit
    • Tamar Christina's avatar
      stack: fix stack allocations on Windows · d0bbe1bf
      Tamar Christina authored
      Summary:
      On Windows one is not allowed to drop the stack by more than a page size.
      The reason for this is that the OS only allocates enough stack till what
      the TEB specifies. After that a guard page is placed and the rest of the
      virtual address space is unmapped.
      
      The intention is that doing stack allocations will cause you to hit the
      guard which will then map the next page in and move the guard.  This is
      done to prevent what in the Linux world is known as stack clash
      vulnerabilities https://access.redhat.com/security/cve/cve-2017-1000364.
      
      There are modules in GHC for which the liveliness analysis thinks the
      reserved 8KB of spill slots isn't enough.  One being DynFlags and the
      other being Cabal.
      
      Though I think the Cabal one is likely a bug:
      
      ```
        4d6544:       81 ec 00 46 00 00       sub    $0x4600,%esp
        4d654a:       8d 85 94 fe ff ff       lea    -0x16c(%ebp),%eax
        4d6550:       3b 83 1c 03 00 00       cmp    0x31c(%ebx),%eax
        4d6556:       0f 82 de 8d 02 00       jb     4ff33a <_cLpg_info+0x7a>
        4d655c:       c7 45 fc 14 3d 50 00    movl   $0x503d14,-0x4(%ebp)
        4d6563:       8b 75 0c                mov    0xc(%ebp),%esi
        4d6566:       83 c5 fc                add    $0xfffffffc,%ebp
        4d6569:       66 f7 c6 03 00          test   $0x3,%si
        4d656e:       0f 85 a6 d7 02 00       jne    503d1a <_cLpb_info+0x6>
        4d6574:       81 c4 00 46 00 00       add    $0x4600,%esp
      ```
      
      It allocates nearly 18KB of spill slots for a simple 4 line function
      and doesn't even use it.  Note that this doesn't happen on x64 or
      when making a validate build.  Only when making a build without a
      validate and build.mk.
      
      This and the allocation in DynFlags means the stack allocation will jump
      over the guard page into unmapped memory areas and GHC or an end program
      segfaults.
      
      The pagesize on x86 Windows is 4KB which means we hit it very easily for
      these two modules, which explains the total DOA of GHC 32bit for the past
      3 releases and the "random" segfaults on Windows.
      
      ```
      0:000> bp 00503d29
      0:000> gn
      Breakpoint 0 hit
      WARNING: Stack overflow detected. The unwound frames are extracted from outside
               normal stack bounds.
      eax=03b6b9c9 ebx=00dc90f0 ecx=03cac48c edx=03cac43d esi=03b6b9c9 edi=03abef40
      eip=00503d29 esp=013e96fc ebp=03cf8f70 iopl=0         nv up ei pl nz na po nc
      cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
      setup+0x103d29:
      00503d29 89442440        mov     dword ptr [esp+40h],eax ss:002b:013e973c=????????
      WARNING: Stack overflow detected. The unwound frames are extracted from outside
               normal stack bounds.
      WARNING: Stack overflow detected. The unwound frames are extracted from outside
               normal stack bounds.
      0:000> !teb
      TEB at 00384000
          ExceptionList:        013effcc
          StackBase:            013f0000
          StackLimit:           013eb000
      ```
      
      This doesn't fix the liveliness analysis but does fix the allocations, by
      emitting a function call to `__chkstk_ms` when doing allocations of larger
      than a page, this will make sure the stack is probed every page so the kernel
      maps in the next page.
      
      `__chkstk_ms` is provided by `libGCC`, which is under the
      `GNU runtime exclusion license`, so it's safe to link against it, even for
      proprietary code. (Technically we already do since we link compiled C code in.)
      
      For allocations smaller than a page we drop the stack and probe the new address.
      This avoids the function call and still makes sure we hit the guard if needed.
      
      PS: In case anyone is Wondering why we didn't notice this before, it's because we
      only test x86_64 and on Windows 10.  On x86_64 the page size is 8KB and also the
      kernel is a bit more lenient on Windows 10 in that it seems to catch the segfault
      and resize the stack if it was unmapped:
      
      ```
      0:000> t
      eax=03b6b9c9 ebx=00dc90f0 ecx=03cac48c edx=03cac43d esi=03b6b9c9 edi=03abef40
      eip=00503d2d esp=013e96fc ebp=03cf8f70 iopl=0         nv up ei pl nz na po nc
      cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
      setup+0x103d2d:
      00503d2d 8b461b          mov     eax,dword ptr [esi+1Bh] ds:002b:03b6b9e4=03cac431
      0:000> !teb
      TEB at 00384000
          ExceptionList:        013effcc
          StackBase:            013f0000
          StackLimit:           013e9000
      ```
      
      Likely Windows 10 has a guard page larger than previous versions.
      
      This fixes the stack allocations, and as soon as I get the time I will look at
      the liveliness analysis. I find it highly unlikely that simple Cabal function
      requires ~2200 spill slots.
      
      Test Plan: ./validate
      
      Reviewers: simonmar, bgamari
      
      Reviewed By: bgamari
      
      Subscribers: AndreasK, rwbarton, thomie, carter
      
      GHC Trac Issues: #15154
      
      Differential Revision: https://phabricator.haskell.org/D4917
      d0bbe1bf
  11. 19 Sep, 2017 1 commit
    • Herbert Valerio Riedel's avatar
      compiler: introduce custom "GhcPrelude" Prelude · f63bc730
      Herbert Valerio Riedel authored
      This switches the compiler/ component to get compiled with
      -XNoImplicitPrelude and a `import GhcPrelude` is inserted in all
      modules.
      
      This is motivated by the upcoming "Prelude" re-export of
      `Semigroup((<>))` which would cause lots of name clashes in every
      modulewhich imports also `Outputable`
      
      Reviewers: austin, goldfire, bgamari, alanz, simonmar
      
      Reviewed By: bgamari
      
      Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari
      
      Differential Revision: https://phabricator.haskell.org/D3989
      f63bc730
  12. 23 Jun, 2017 1 commit
    • Michal Terepeta's avatar
      Hoopl: remove dependency on Hoopl package · 42eee6ea
      Michal Terepeta authored
      
      
      This copies the subset of Hoopl's functionality needed by GHC to
      `cmm/Hoopl` and removes the dependency on the Hoopl package.
      
      The main motivation for this change is the confusing/noisy interface
      between GHC and Hoopl:
      - Hoopl has `Label` which is GHC's `BlockId` but different than
        GHC's `CLabel`
      - Hoopl has `Unique` which is different than GHC's `Unique`
      - Hoopl has `Unique{Map,Set}` which are different than GHC's
        `Uniq{FM,Set}`
      - GHC has its own specialized copy of `Dataflow`, so `cmm/Hoopl` is
        needed just to filter the exposed functions (filter out some of the
        Hoopl's and add the GHC ones)
      With this change, we'll be able to simplify this significantly.
      It'll also be much easier to do invasive changes (Hoopl is a public
      package on Hackage with users that depend on the current behavior)
      
      This should introduce no changes in functionality - it merely
      copies the relevant code.
      Signed-off-by: Michal Terepeta's avatarMichal Terepeta <michal.terepeta@gmail.com>
      
      Test Plan: ./validate
      
      Reviewers: austin, bgamari, simonmar
      
      Reviewed By: bgamari, simonmar
      
      Subscribers: simonpj, kavon, rwbarton, thomie
      
      Differential Revision: https://phabricator.haskell.org/D3616
      42eee6ea
  13. 08 Dec, 2016 1 commit
  14. 23 Sep, 2013 1 commit
    • Simon Marlow's avatar
      Discard unreachable code in the register allocator (#7574) · f5879acd
      Simon Marlow authored
      The problem with unreachable code is that it might refer to undefined
      registers.  This happens accidentally: a block can be orphaned by an
      optimisation, for example when the result of a comparsion becomes
      known.
      
      The register allocator panics when it finds an undefined register,
      because they shouldn't occur in generated code.  So we need to also
      discard unreachable code to prevent this panic being triggered by
      optimisations.
      
      The register alloator already does a strongly-connected component
      analysis, so it ought to be easy to make it discard unreachable code
      as part of that traversal.  It turns out that we need a different
      variant of the scc algorithm to do that (see Digraph), however the new
      variant also generates slightly better code by putting the blocks
      within a loop in a better order for register allocation.
      f5879acd
  15. 12 Nov, 2012 2 commits
    • Simon Marlow's avatar
      Fix warnings · 92957808
      Simon Marlow authored
      92957808
    • Simon Marlow's avatar
      Remove OldCmm, convert backends to consume new Cmm · d92bd17f
      Simon Marlow authored
      This removes the OldCmm data type and the CmmCvt pass that converts
      new Cmm to OldCmm.  The backends (NCGs, LLVM and C) have all been
      converted to consume new Cmm.
      
      The main difference between the two data types is that conditional
      branches in new Cmm have both true/false successors, whereas in OldCmm
      the false case was a fallthrough.  To generate slightly better code we
      occasionally need to invert a conditional to ensure that the
      branch-not-taken becomes a fallthrough; this was previously done in
      CmmCvt, and it is now done in CmmContFlowOpt.
      
      We could go further and use the Hoopl Block representation for native
      code, which would mean that we could use Hoopl's postorderDfs and
      analyses for native code, but for now I've left it as is, using the
      old ListGraph representation for native code.
      d92bd17f
  16. 20 Sep, 2012 1 commit
    • Simon Marlow's avatar
      Teach the linear register allocator how to allocate more stack if necessary · 0b0a41f9
      Simon Marlow authored
      This squashes the "out of spill slots" panic that occasionally happens
      on x86, by adding instructions to bump and retreat the C stack pointer
      as necessary.  The panic has become more common since the new codegen,
      because we lump code into larger blocks, and the register allocator
      isn't very good at reusing stack slots for spilling (see Note [extra
      spill slots]).
      0b0a41f9
  17. 14 Sep, 2012 1 commit
  18. 21 Aug, 2012 1 commit
  19. 30 Jul, 2012 1 commit
    • Simon Marlow's avatar
      New codegen: do not split proc-points when using the NCG · f1ed6a10
      Simon Marlow authored
      Proc-point splitting is only required by backends that do not support
      having proc-points within a code block (that is, everything except the
      native backend, i.e. LLVM and C).
      
      Not doing proc-point splitting saves some compilation time, and might
      produce slightly better code in some cases.
      f1ed6a10
  20. 25 Aug, 2011 2 commits
  21. 15 Jul, 2011 1 commit
  22. 12 Jul, 2011 1 commit
  23. 06 Jul, 2011 1 commit
    • batterseapower's avatar
      Refactoring: explicitly mark whether we have an info table in RawCmm · 41ca0b8d
      batterseapower authored
      I introduced this to support explicitly recording the info table label
      in RawCmm for another patch I am working on, but it turned out to lead
      to significant simplification in those parts of the compiler that
      consume RawCmm.
      
      Now, instead of lots of tests for null [CmmStatic] we have a simple
      test of a Maybe, and have reduced the number of guys that need to know
      how to convert entry->info labels by a TON. There are only 3 callers
      of that function now!
      41ca0b8d
  24. 05 Jul, 2011 1 commit
    • batterseapower's avatar
      Refactoring: use a structured CmmStatics type rather than [CmmStatic] · 54843b5b
      batterseapower authored
      I observed that the [CmmStatics] within CmmData uses the list in a very stylised way.
      The first item in the list is almost invariably a CmmDataLabel. Many parts of the
      compiler pattern match on this list and fail if this is not true.
      
      This patch makes the invariant explicit by introducing a structured type CmmStatics
      that holds the label and the list of remaining [CmmStatic].
      
      There is one wrinkle: the x86 backend sometimes wants to output an alignment directive just
      before the label. However, this can be easily fixed up by parameterising the native codegen
      over the type of CmmStatics (though the GenCmmTop parameterisation) and using a pair
      (Alignment, CmmStatics) there instead.
      
      As a result, I think we will be able to remove CmmAlign and CmmDataLabel from the CmmStatic
      data type, thus nuking a lot of code and failing pattern matches. This change will come as part
      of my next patch.
      54843b5b
  25. 24 Jan, 2011 1 commit
    • Simon Marlow's avatar
      Merge in new code generator branch. · 889c084e
      Simon Marlow authored
      This changes the new code generator to make use of the Hoopl package
      for dataflow analysis.  Hoopl is a new boot package, and is maintained
      in a separate upstream git repository (as usual, GHC has its own
      lagging darcs mirror in http://darcs.haskell.org/packages/hoopl).
      
      During this merge I squashed recent history into one patch.  I tried
      to rebase, but the history had some internal conflicts of its own
      which made rebase extremely confusing, so I gave up. The history I
      squashed was:
      
        - Update new codegen to work with latest Hoopl
        - Add some notes on new code gen to cmm-notes
        - Enable Hoopl lag package.
        - Add SPJ note to cmm-notes
        - Improve GC calls on new code generator.
      
      Work in this branch was done by:
         - Milan Straka <fox@ucw.cz>
         - John Dias <dias@cs.tufts.edu>
         - David Terei <davidterei@gmail.com>
      
      Edward Z. Yang <ezyang@mit.edu> merged in further changes from GHC HEAD
      and fixed a few bugs.
      889c084e
  26. 15 Feb, 2009 1 commit
    • Ben.Lippmeier@anu.edu.au's avatar
      NCG: Split up the native code generator into arch specific modules · b04a210e
      Ben.Lippmeier@anu.edu.au authored
        - nativeGen/Instruction defines a type class for a generic
          instruction set. Each of the instruction sets we have, 
          X86, PPC and SPARC are instances of it.
        
        - The register alloctors use this type class when they need
          info about a certain register or instruction, such as
          regUsage, mkSpillInstr, mkJumpInstr, patchRegs..
        
        - nativeGen/Platform defines some data types enumerating
          the architectures and operating systems supported by the 
          native code generator.
        
        - DynFlags now keeps track of the current build platform, and 
          the PositionIndependentCode module uses this to decide what
          to do instead of relying of #ifdefs.
        
        - It's not totally retargetable yet. Some info info about the
          build target is still hardwired, but I've tried to contain
          most of it to a single module, TargetRegs.
        
        - Moved the SPILL and RELOAD instructions into LiveInstr.
        
        - Reg and RegClass now have their own modules, and are shared
          across all architectures.
      b04a210e