1. 12 Aug, 2020 1 commit
    • Sylvain Henry's avatar
      DynFlags: disentangle Outputable · accbc242
      Sylvain Henry authored
      - put panic related functions into GHC.Utils.Panic
      - put trace related functions using DynFlags in GHC.Driver.Ppr
      
      One step closer making Outputable fully independent of DynFlags.
      
      Bump haddock submodule
      accbc242
  2. 14 Jun, 2020 1 commit
  3. 26 Apr, 2020 1 commit
  4. 23 Apr, 2020 1 commit
  5. 21 Apr, 2020 1 commit
    • Sylvain Henry's avatar
      CmmToAsm DynFlags refactoring (#17957) · 747093b7
      Sylvain Henry authored
      * Remove `DynFlags` parameter from `isDynLinkName`: `isDynLinkName` used
        to test the global `ExternalDynamicRefs` flag. Now we test it outside of
        `isDynLinkName`
      
      * Add new fields into `NCGConfig`: current unit id, sse/bmi versions,
        externalDynamicRefs, etc.
      
      * Replace many uses of `DynFlags` by `NCGConfig`
      
      * Moved `BMI/SSE` datatypes into `GHC.Platform`
      747093b7
  6. 03 Apr, 2020 1 commit
    • Sylvain Henry's avatar
      Refactor CmmStatics · cc2918a0
      Sylvain Henry authored
      In !2959 we noticed that there was some redundant code (in GHC.Cmm.Utils
      and GHC.Cmm.StgToCmm.Utils) used to deal with `CmmStatics` datatype
      (before SRT generation) and `RawCmmStatics` datatype (after SRT
      generation).
      
      This patch removes this redundant code by using a single GADT for
      (Raw)CmmStatics.
      cc2918a0
  7. 29 Mar, 2020 1 commit
  8. 19 Mar, 2020 1 commit
  9. 15 Mar, 2020 1 commit
    • Sylvain Henry's avatar
      Refactor CmmToAsm (disentangle DynFlags) · 2e82465f
      Sylvain Henry authored
      This patch disentangles a bit more DynFlags from the native code
      generator (CmmToAsm).
      
      In more details:
      
      - add a new NCGConfig datatype in GHC.CmmToAsm.Config which contains the
        configuration of a native code generation session
      - explicitly pass NCGConfig/Platform arguments when necessary
      - as a consequence `sdocWithPlatform` is gone and there are only a few
        `sdocWithDynFlags` left
      - remove the use of `unsafeGlobalDynFlags` from GHC.CmmToAsm.CFG
      - remove `sdocDebugLevel` (now we pass the debug level via NCGConfig)
      
      There are still some places where DynFlags is used, especially because
      of pretty-printing (CLabel), because of Cmm helpers (such as
      `cmmExprType`) and because of `Outputable` instance for the
      instructions. These are left for future refactoring as this patch is
      already big.
      2e82465f
  10. 25 Feb, 2020 1 commit
  11. 22 Feb, 2020 1 commit
  12. 31 Jan, 2020 1 commit
    • Ömer Sinan Ağacan's avatar
      Do CafInfo/SRT analysis in Cmm · c846618a
      Ömer Sinan Ağacan authored
      This patch removes all CafInfo predictions and various hacks to preserve
      predicted CafInfos from the compiler and assigns final CafInfos to
      interface Ids after code generation. SRT analysis is extended to support
      static data, and Cmm generator is modified to allow generating
      static_link fields after SRT analysis.
      
      This also fixes `-fcatch-bottoms`, which introduces error calls in case
      expressions in CorePrep, which runs *after* CoreTidy (which is where we
      decide on CafInfos) and turns previously non-CAFFY things into CAFFY.
      
      Fixes #17648
      Fixes #9718
      
      Evaluation
      ==========
      
      NoFib
      -----
      
      Boot with: `make boot mode=fast`
      Run: `make mode=fast EXTRA_RUNTEST_OPTS="-cachegrind" NoFibRuns=1`
      
      --------------------------------------------------------------------------------
              Program           Size    Allocs    Instrs     Reads    Writes
      --------------------------------------------------------------------------------
                   CS          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  CSD          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                   FS          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                    S          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                   VS          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  VSD          -0.0%      0.0%     -0.0%     -0.0%     -0.5%
                  VSM          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 anna          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
                 ansi          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 atom          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               awards          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               banner          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           bernouilli          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         binary-trees          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                boyer          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               boyer2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 bspt          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            cacheprof          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             calendar          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             cichelli          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              circsim          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             clausify          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
        comp_lab_zift          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             compress          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            compress2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
          constraints          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         cryptarithm1          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         cryptarithm2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  cse          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         digits-of-e1          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         digits-of-e2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               dom-lt          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                eliza          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                event          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
          exact-reals          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               exp3_8          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               expert          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
       fannkuch-redux          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                fasta          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  fem          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  fft          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 fft2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             fibheaps          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 fish          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                fluid          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
               fulsom          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               gamteb          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  gcd          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
          gen_regexps          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               genfft          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                   gg          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 grep          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               hidden          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  hpg          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
                  ida          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                infer          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              integer          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            integrate          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         k-nucleotide          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                kahan          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              knights          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               lambda          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           last-piece          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 lcss          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 life          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 lift          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               linear          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
            listcompr          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             listcopy          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             maillist          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               mandel          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              mandel2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 mate          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              minimax          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              mkhprog          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           multiplier          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               n-body          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             nucleic2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 para          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            paraffins          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               parser          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
              parstof          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
                  pic          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             pidigits          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                power          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               pretty          -0.0%      0.0%     -0.3%     -0.4%     -0.4%
               primes          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            primetest          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               prolog          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               puzzle          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               queens          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              reptile          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
      reverse-complem          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              rewrite          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 rfib          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  rsa          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  scc          -0.0%      0.0%     -0.3%     -0.5%     -0.4%
                sched          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  scs          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               simple          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
                solid          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              sorting          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
        spectral-norm          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               sphere          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               symalg          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  tak          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            transform          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             treejoin          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            typecheck          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              veritas          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 wang          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            wave4main          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         wheel-sieve1          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         wheel-sieve2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 x2n1          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
      --------------------------------------------------------------------------------
                  Min          -0.1%      0.0%     -0.3%     -0.5%     -0.5%
                  Max          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
       Geometric Mean          -0.0%     -0.0%     -0.0%     -0.0%     -0.0%
      
      --------------------------------------------------------------------------------
              Program           Size    Allocs    Instrs     Reads    Writes
      --------------------------------------------------------------------------------
              circsim          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
          constraints          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             fibheaps          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             gc_bench          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 hash          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 lcss          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                power          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           spellcheck          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
      --------------------------------------------------------------------------------
                  Min          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
                  Max          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
       Geometric Mean          -0.0%     +0.0%     -0.0%     -0.0%     -0.0%
      
      Manual inspection of programs in testsuite/tests/programs
      ---------------------------------------------------------
      
      I built these programs with a bunch of dump flags and `-O` and compared
      STG, Cmm, and Asm dumps and file sizes.
      
      (Below the numbers in parenthesis show number of modules in the program)
      
      These programs have identical compiler (same .hi and .o sizes, STG, and
      Cmm and Asm dumps):
      
      - Queens (1), andre_monad (1), cholewo-eval (2), cvh_unboxing (3),
        andy_cherry (7), fun_insts (1), hs-boot (4), fast2haskell (2),
        jl_defaults (1), jq_readsPrec (1), jules_xref (1), jtod_circint (4),
        jules_xref2 (1), lennart_range (1), lex (1), life_space_leak (1),
        bargon-mangler-bug (7), record_upd (1), rittri (1), sanders_array (1),
        strict_anns (1), thurston-module-arith (2), okeefe_neural (1),
        joao-circular (6), 10queens (1)
      
      Programs with different compiler outputs:
      
      - jl_defaults (1): For some reason GHC HEAD marks a lot of top-level
        `[Int]` closures as CAFFY for no reason. With this patch we no longer
        make them CAFFY and generate less SRT entries. For some reason Main.o
        is slightly larger with this patch (1.3%) and the executable sizes are
        the same. (I'd expect both to be smaller)
      
      - launchbury (1): Same as jl_defaults: top-level `[Int]` closures marked
        as CAFFY for no reason. Similarly `Main.o` is 1.4% larger but the
        executable sizes are the same.
      
      - galois_raytrace (13): Differences are in the Parse module. There are a
        lot, but some of the changes are caused by the fact that for some
        reason (I think a bug) GHC HEAD marks the dictionary for `Functor
        Identity` as CAFFY. Parse.o is 0.4% larger, the executable size is the
        same.
      
      - north_array: We now generate less SRT entries because some of array
        primops used in this program like `NewArrayOp` get eliminated during
        Stg-to-Cmm and turn some CAFFY things into non-CAFFY. Main.o gets 24%
        larger (9224 bytes from 9000 bytes), executable sizes are the same.
      
      - seward-space-leak: Difference in this program is better shown by this
        smaller example:
      
            module Lib where
      
            data CDS
              = Case [CDS] [(Int, CDS)]
              | Call CDS CDS
      
            instance Eq CDS where
              Case sels1 rets1 == Case sels2 rets2 =
                  sels1 == sels2 && rets1 == rets2
              Call a1 b1 == Call a2 b2 =
                  a1 == a2 && b1 == b2
              _ == _ =
                  False
      
         In this program GHC HEAD builds a new SRT for the recursive group of
         `(==)`, `(/=)` and the dictionary closure. Then `/=` points to `==`
         in its SRT field, and `==` uses the SRT object as its SRT. With this
         patch we use the closure for `/=` as the SRT and add `==` there. Then
         `/=` gets an empty SRT field and `==` points to `/=` in its SRT
         field.
      
         This change looks fine to me.
      
         Main.o gets 0.07% larger, executable sizes are identical.
      
      head.hackage
      ------------
      
      head.hackage's CI script builds 428 packages from Hackage using this
      patch with no failures.
      
      Compiler performance
      --------------------
      
      The compiler perf tests report that the compiler allocates slightly more
      (worst case observed so far is 4%). However most programs in the test
      suite are small, single file programs. To benchmark compiler performance
      on something more realistic I build Cabal (the library, 236 modules)
      with different optimisation levels. For the "max residency" row I run
      GHC with `+RTS -s -A100k -i0 -h` for more accurate numbers. Other rows
      are generated with just `-s`. (This is because `-i0` causes running GC
      much more frequently and as a result "bytes copied" gets inflated by
      more than 25x in some cases)
      
      * -O0
      
      |                 | GHC HEAD       | This MR        | Diff   |
      | --------------- | -------------- | -------------- | ------ |
      | Bytes allocated | 54,413,350,872 | 54,701,099,464 | +0.52% |
      | Bytes copied    |  4,926,037,184 |  4,990,638,760 | +1.31% |
      | Max residency   |    421,225,624 |    424,324,264 | +0.73% |
      
      * -O1
      
      |                 | GHC HEAD        | This MR         | Diff   |
      | --------------- | --------------- | --------------- | ------ |
      | Bytes allocated | 245,849,209,992 | 246,562,088,672 | +0.28% |
      | Bytes copied    |  26,943,452,560 |  27,089,972,296 | +0.54% |
      | Max residency   |     982,643,440 |     991,663,432 | +0.91% |
      
      * -O2
      
      |                 | GHC HEAD        | This MR         | Diff   |
      | --------------- | --------------- | --------------- | ------ |
      | Bytes allocated | 291,044,511,408 | 291,863,910,912 | +0.28% |
      | Bytes copied    |  37,044,237,616 |  36,121,690,472 | -2.49% |
      | Max residency   |   1,071,600,328 |   1,086,396,256 | +1.38% |
      
      Extra compiler allocations
      --------------------------
      
      Runtime allocations of programs are as reported above (NoFib section).
      
      The compiler now allocates more than before. Main source of allocation
      in this patch compared to base commit is the new SRT algorithm
      (GHC.Cmm.Info.Build). Below is some of the extra work we do with this
      patch, numbers generated by profiled stage 2 compiler when building a
      pathological case (the test 'ManyConstructors') with '-O2':
      
      - We now sort the final STG for a module, which means traversing the
        entire program, generating free variable set for each top-level
        binding, doing SCC analysis, and re-ordering the program. In
        ManyConstructors this step allocates 97,889,952 bytes.
      
      - We now do SRT analysis on static data, which in a program like
        ManyConstructors causes analysing 10,000 bindings that we would
        previously just skip. This step allocates 70,898,352 bytes.
      
      - We now maintain an SRT map for the entire module as we compile Cmm
        groups:
      
            data ModuleSRTInfo = ModuleSRTInfo
              { ...
              , moduleSRTMap :: SRTMap
              }
      
         (SRTMap is just a strict Map from the 'containers' library)
      
         This map gets an entry for most bindings in a module (exceptions are
         THUNKs and CAFFY static functions). For ManyConstructors this map
         gets 50015 entries.
      
      - Once we're done with code generation we generate a NameSet from SRTMap
        for the non-CAFFY names in the current module. This set gets the same
        number of entries as the SRTMap.
      
      - Finally we update CafInfos in ModDetails for the non-CAFFY Ids, using
        the NameSet generated in the previous step. This usually does the
        least amount of allocation among the work listed here.
      
      Only place with this patch where we do less work in the CAF analysis in
      the tidying pass (CoreTidy). However that doesn't save us much, as the
      pass still needs to traverse the whole program and update IdInfos for
      other reasons. Only thing we don't here do is the `hasCafRefs` pass over
      the RHS of bindings, which is a stateless pass that returns a boolean
      value, so it doesn't allocate much.
      
      (Metric changes blow are all increased allocations)
      
      Metric changes
      --------------
      
      Metric Increase:
          ManyAlternatives
          ManyConstructors
          T13035
          T14683
          T1969
          T9961
      c846618a
  13. 25 Jan, 2020 1 commit
  14. 04 Jan, 2020 1 commit
  15. 03 Dec, 2019 1 commit
  16. 02 Dec, 2019 1 commit
  17. 28 Nov, 2019 1 commit
  18. 05 Oct, 2019 1 commit
    • John Ericson's avatar
      Clean up `#include`s in the compiler · ee8118ca
      John Ericson authored
       - Remove unneeded ones
      
       - Use <..> for inter-package.
         Besides general clean up, helps distinguish between the RTS we link
         against vs the RTS we compile for.
      ee8118ca
  19. 13 Sep, 2019 1 commit
  20. 09 Sep, 2019 1 commit
    • Sylvain Henry's avatar
      Module hierarchy: StgToCmm (#13009) · 447864a9
      Sylvain Henry authored
      Add StgToCmm module hierarchy. Platform modules that are used in several
      other places (NCG, LLVM codegen, Cmm transformations) are put into
      GHC.Platform.
      447864a9
  21. 16 Jul, 2019 1 commit
  22. 03 Jul, 2019 1 commit
  23. 28 Jun, 2019 1 commit
    • Travis Whitaker's avatar
      Correct closure observation, construction, and mutation on weak memory machines. · 11bac115
      Travis Whitaker authored
      Here the following changes are introduced:
          - A read barrier machine op is added to Cmm.
          - The order in which a closure's fields are read and written is changed.
          - Memory barriers are added to RTS code to ensure correctness on
            out-or-order machines with weak memory ordering.
      
      Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this
      is lowered to an instruction that ensures memory reads that occur after said
      instruction in program order are not performed before reads coming before said
      instruction in program order. On machines with strong memory ordering properties
      (e.g. X86, SPARC in TSO mode) no such instruction is necessary, so
      MO_ReadBarrier is simply erased. However, such an instruction is necessary on
      weakly ordered machines, e.g. ARM and PowerPC.
      
      Weam memory ordering has consequences for how closures are observed and mutated.
      For example, consider a closure that needs to be updated to an indirection. In
      order for the indirection to be safe for concurrent observers to enter, said
      observers must read the indirection's info table before they read the
      indirectee. Furthermore, the entering observer makes assumptions about the
      closure based on its info table contents, e.g. an INFO_TYPE of IND imples the
      closure has an indirectee pointer that is safe to follow.
      
      When a closure is updated with an indirection, both its info table and its
      indirectee must be written. With weak memory ordering, these two writes can be
      arbitrarily reordered, and perhaps even interleaved with other threads' reads
      and writes (in the absence of memory barrier instructions). Consider this
      example of a bad reordering:
      
      - An updater writes to a closure's info table (INFO_TYPE is now IND).
      - A concurrent observer branches upon reading the closure's INFO_TYPE as IND.
      - A concurrent observer reads the closure's indirectee and enters it. (!!!)
      - An updater writes the closure's indirectee.
      
      Here the update to the indirectee comes too late and the concurrent observer has
      jumped off into the abyss. Speculative execution can also cause us issues,
      consider:
      
      - An observer is about to case on a value in closure's info table.
      - The observer speculatively reads one or more of closure's fields.
      - An updater writes to closure's info table.
      - The observer takes a branch based on the new info table value, but with the
        old closure fields!
      - The updater writes to the closure's other fields, but its too late.
      
      Because of these effects, reads and writes to a closure's info table must be
      ordered carefully with respect to reads and writes to the closure's other
      fields, and memory barriers must be placed to ensure that reads and writes occur
      in program order. Specifically, updates to a closure must follow the following
      pattern:
      
      - Update the closure's (non-info table) fields.
      - Write barrier.
      - Update the closure's info table.
      
      Observing a closure's fields must follow the following pattern:
      
      - Read the closure's info pointer.
      - Read barrier.
      - Read the closure's (non-info table) fields.
      
      This patch updates RTS code to obey this pattern. This should fix long-standing
      SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting
      out-of-order execution) and PowerPC. This fixes issue #15449.
      Co-Authored-By: Ben Gamari's avatarBen Gamari <ben@well-typed.com>
      11bac115
  24. 20 Jun, 2019 1 commit
    • John Ericson's avatar
      Move 'Platform' to ghc-boot · bff2f24b
      John Ericson authored
      ghc-pkg needs to be aware of platforms so it can figure out which
      subdire within the user package db to use. This is admittedly
      roundabout, but maybe Cabal could use the same notion of a platform as
      GHC to good affect too.
      bff2f24b
  25. 09 Jun, 2019 1 commit
  26. 31 May, 2019 1 commit
    • Sergei Trofimovich's avatar
      powerpc32: fix 64-bit comparison (#16465) · 973077ac
      Sergei Trofimovich authored
      On powerpc32 64-bit comparison code generated dangling
      target labels. This caused ghc build failure as:
      
          $ ./configure --target=powerpc-unknown-linux-gnu && make
          ...
          SCCs aren't in reverse dependent order
          bad blockId n3U
      
      This happened because condIntCode' in PPC codegen generated
      label name but did not place the label into `cmp_lo` code block.
      
      The change adds the `cmp_lo` label into the case of negative
      comparison.
      Signed-off-by: default avatarSergei Trofimovich <slyfox@gentoo.org>
      973077ac
  27. 11 Apr, 2019 1 commit
    • Carter Schonwald's avatar
      removing x87 register support from native code gen · 42504f4a
      Carter Schonwald authored
      * simplifies registers to have GPR, Float and Double, by removing the SSE2 and X87 Constructors
      * makes -msse2 assumed/default for x86 platforms, fixing a long standing nondeterminism in rounding
      behavior in 32bit haskell code
      * removes the 80bit floating point representation from the supported float sizes
      * theres still 1 tiny bit of x87 support needed,
      for handling float and double return values in FFI calls  wrt the C ABI on x86_32,
      but this one piece does not leak into the rest of NCG.
      * Lots of code thats not been touched in a long time got deleted as a
      consequence of all of this
      
      all in all, this change paves the way towards a lot of future further
      improvements in how GHC handles floating point computations, along with
      making the native code gen more accessible to a larger pool of contributors.
      42504f4a
  28. 01 Apr, 2019 1 commit
  29. 15 Mar, 2019 1 commit
    • Peter Trommler's avatar
      PPC NCG: Use liveness information in CmmCall · 83e09d3c
      Peter Trommler authored
      We make liveness information for global registers
      available on `JMP` and `BCTR`, which were the last instructions
      missing. With complete liveness information we do not need to
      reserve global registers in `freeReg` anymore. Moreover we
      assign R9 and R10 to callee saves registers.
      
      Cleanup by removing `Reg_Su`, which was unused, from `freeReg`
      and removing unused register definitions.
      
      The calculation of the number of floating point registers is too
      conservative. Just follow X86 and specify the constants directly.
      
      Overall on PowerPC this results in 0.3 % smaller code size in nofib
      while runtime is slightly better in some tests.
      83e09d3c
  30. 31 Jan, 2019 1 commit
  31. 17 Jan, 2019 3 commits
  32. 01 Jan, 2019 1 commit
  33. 30 Dec, 2018 1 commit
  34. 11 Dec, 2018 2 commits
  35. 17 Nov, 2018 1 commit
    • Andreas Klebinger's avatar
      NCG: New code layout algorithm. · 912fd2b6
      Andreas Klebinger authored
      Summary:
      This patch implements a new code layout algorithm.
      It has been tested for x86 and is disabled on other platforms.
      
      Performance varies slightly be CPU/Machine but in general seems to be better
      by around 2%.
      Nofib shows only small differences of about +/- ~0.5% overall depending on
      flags/machine performance in other benchmarks improved significantly.
      
      Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec,
      containers, text and xeno.
      
      While the magnitude of gains differed three different CPUs where tested with
      all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell,
      Skylake
      
      * Library benchmark results summarized:
        * containers: ~1.5% faster
        * aeson: ~2% faster
        * megaparsec: ~2-5% faster
        * xml library benchmarks: 0.2%-1.1% faster
        * vector-benchmarks: 1-4% faster
        * text: 5.5% faster
      
      On average GHC compile times go down, as GHC compiled with the new layout
      is faster than the overhead introduced by using the new layout algorithm,
      
      Things this patch does:
      
      * Move code responsilbe for block layout in it's own module.
      * Move the NcgImpl Class into the NCGMonad module.
      * Extract a control flow graph from the input cmm.
      * Update this cfg to keep it in sync with changes during
        asm codegen. This has been tested on x64 but should work on x86.
        Other platforms still use the old codelayout.
      * Assign weights to the edges in the CFG based on type and limited static
        analysis which are then used for block layout.
      * Once we have the final code layout eliminate some redundant jumps.
      
        In particular turn a sequences of:
            jne .foo
            jmp .bar
          foo:
        into
            je bar
          foo:
            ..
      
      Test Plan: ci
      
      Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott
      
      Reviewed By: RyanGlScott
      
      Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton
      
      GHC Trac Issues: #15124
      
      Differential Revision: https://phabricator.haskell.org/D4726
      912fd2b6
  36. 21 Aug, 2018 1 commit
  37. 16 May, 2018 1 commit
    • Simon Marlow's avatar
      Allow CmmLabelDiffOff with different widths · fbd28e2c
      Simon Marlow authored
      Summary:
      This change makes it possible to generate a static 32-bit relative label
      offset on x86_64. Currently we can only generate word-sized label
      offsets.
      
      This will be used in D4634 to shrink info tables.  See D4632 for more
      details.
      
      Test Plan: See D4632
      
      Reviewers: bgamari, niteria, michalt, erikd, jrtc27, osa1
      
      Subscribers: thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4633
      fbd28e2c