1. 15 Mar, 2020 1 commit
    • Sylvain Henry's avatar
      Refactor CmmToAsm (disentangle DynFlags) · 2e82465f
      Sylvain Henry authored
      This patch disentangles a bit more DynFlags from the native code
      generator (CmmToAsm).
      
      In more details:
      
      - add a new NCGConfig datatype in GHC.CmmToAsm.Config which contains the
        configuration of a native code generation session
      - explicitly pass NCGConfig/Platform arguments when necessary
      - as a consequence `sdocWithPlatform` is gone and there are only a few
        `sdocWithDynFlags` left
      - remove the use of `unsafeGlobalDynFlags` from GHC.CmmToAsm.CFG
      - remove `sdocDebugLevel` (now we pass the debug level via NCGConfig)
      
      There are still some places where DynFlags is used, especially because
      of pretty-printing (CLabel), because of Cmm helpers (such as
      `cmmExprType`) and because of `Outputable` instance for the
      instructions. These are left for future refactoring as this patch is
      already big.
      2e82465f
  2. 25 Feb, 2020 1 commit
  3. 22 Feb, 2020 1 commit
  4. 31 Jan, 2020 1 commit
    • Ömer Sinan Ağacan's avatar
      Do CafInfo/SRT analysis in Cmm · c846618a
      Ömer Sinan Ağacan authored
      This patch removes all CafInfo predictions and various hacks to preserve
      predicted CafInfos from the compiler and assigns final CafInfos to
      interface Ids after code generation. SRT analysis is extended to support
      static data, and Cmm generator is modified to allow generating
      static_link fields after SRT analysis.
      
      This also fixes `-fcatch-bottoms`, which introduces error calls in case
      expressions in CorePrep, which runs *after* CoreTidy (which is where we
      decide on CafInfos) and turns previously non-CAFFY things into CAFFY.
      
      Fixes #17648
      Fixes #9718
      
      Evaluation
      ==========
      
      NoFib
      -----
      
      Boot with: `make boot mode=fast`
      Run: `make mode=fast EXTRA_RUNTEST_OPTS="-cachegrind" NoFibRuns=1`
      
      --------------------------------------------------------------------------------
              Program           Size    Allocs    Instrs     Reads    Writes
      --------------------------------------------------------------------------------
                   CS          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  CSD          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                   FS          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                    S          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                   VS          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  VSD          -0.0%      0.0%     -0.0%     -0.0%     -0.5%
                  VSM          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 anna          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
                 ansi          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 atom          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               awards          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               banner          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           bernouilli          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         binary-trees          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                boyer          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               boyer2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 bspt          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            cacheprof          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             calendar          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             cichelli          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              circsim          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             clausify          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
        comp_lab_zift          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             compress          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            compress2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
          constraints          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         cryptarithm1          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         cryptarithm2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  cse          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         digits-of-e1          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         digits-of-e2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               dom-lt          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                eliza          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                event          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
          exact-reals          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               exp3_8          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               expert          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
       fannkuch-redux          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                fasta          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  fem          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  fft          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 fft2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             fibheaps          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 fish          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                fluid          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
               fulsom          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               gamteb          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  gcd          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
          gen_regexps          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               genfft          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                   gg          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 grep          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               hidden          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  hpg          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
                  ida          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                infer          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              integer          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            integrate          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         k-nucleotide          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                kahan          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              knights          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               lambda          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           last-piece          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 lcss          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 life          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 lift          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               linear          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
            listcompr          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             listcopy          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             maillist          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               mandel          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              mandel2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 mate          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              minimax          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              mkhprog          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           multiplier          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               n-body          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             nucleic2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 para          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            paraffins          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               parser          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
              parstof          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
                  pic          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             pidigits          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                power          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               pretty          -0.0%      0.0%     -0.3%     -0.4%     -0.4%
               primes          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            primetest          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               prolog          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               puzzle          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               queens          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              reptile          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
      reverse-complem          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              rewrite          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 rfib          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  rsa          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  scc          -0.0%      0.0%     -0.3%     -0.5%     -0.4%
                sched          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  scs          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               simple          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
                solid          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              sorting          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
        spectral-norm          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               sphere          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
               symalg          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                  tak          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            transform          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             treejoin          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            typecheck          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              veritas          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 wang          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            wave4main          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         wheel-sieve1          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         wheel-sieve2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 x2n1          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
      --------------------------------------------------------------------------------
                  Min          -0.1%      0.0%     -0.3%     -0.5%     -0.5%
                  Max          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
       Geometric Mean          -0.0%     -0.0%     -0.0%     -0.0%     -0.0%
      
      --------------------------------------------------------------------------------
              Program           Size    Allocs    Instrs     Reads    Writes
      --------------------------------------------------------------------------------
              circsim          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
          constraints          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             fibheaps          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             gc_bench          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 hash          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                 lcss          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
                power          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           spellcheck          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
      --------------------------------------------------------------------------------
                  Min          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
                  Max          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
       Geometric Mean          -0.0%     +0.0%     -0.0%     -0.0%     -0.0%
      
      Manual inspection of programs in testsuite/tests/programs
      ---------------------------------------------------------
      
      I built these programs with a bunch of dump flags and `-O` and compared
      STG, Cmm, and Asm dumps and file sizes.
      
      (Below the numbers in parenthesis show number of modules in the program)
      
      These programs have identical compiler (same .hi and .o sizes, STG, and
      Cmm and Asm dumps):
      
      - Queens (1), andre_monad (1), cholewo-eval (2), cvh_unboxing (3),
        andy_cherry (7), fun_insts (1), hs-boot (4), fast2haskell (2),
        jl_defaults (1), jq_readsPrec (1), jules_xref (1), jtod_circint (4),
        jules_xref2 (1), lennart_range (1), lex (1), life_space_leak (1),
        bargon-mangler-bug (7), record_upd (1), rittri (1), sanders_array (1),
        strict_anns (1), thurston-module-arith (2), okeefe_neural (1),
        joao-circular (6), 10queens (1)
      
      Programs with different compiler outputs:
      
      - jl_defaults (1): For some reason GHC HEAD marks a lot of top-level
        `[Int]` closures as CAFFY for no reason. With this patch we no longer
        make them CAFFY and generate less SRT entries. For some reason Main.o
        is slightly larger with this patch (1.3%) and the executable sizes are
        the same. (I'd expect both to be smaller)
      
      - launchbury (1): Same as jl_defaults: top-level `[Int]` closures marked
        as CAFFY for no reason. Similarly `Main.o` is 1.4% larger but the
        executable sizes are the same.
      
      - galois_raytrace (13): Differences are in the Parse module. There are a
        lot, but some of the changes are caused by the fact that for some
        reason (I think a bug) GHC HEAD marks the dictionary for `Functor
        Identity` as CAFFY. Parse.o is 0.4% larger, the executable size is the
        same.
      
      - north_array: We now generate less SRT entries because some of array
        primops used in this program like `NewArrayOp` get eliminated during
        Stg-to-Cmm and turn some CAFFY things into non-CAFFY. Main.o gets 24%
        larger (9224 bytes from 9000 bytes), executable sizes are the same.
      
      - seward-space-leak: Difference in this program is better shown by this
        smaller example:
      
            module Lib where
      
            data CDS
              = Case [CDS] [(Int, CDS)]
              | Call CDS CDS
      
            instance Eq CDS where
              Case sels1 rets1 == Case sels2 rets2 =
                  sels1 == sels2 && rets1 == rets2
              Call a1 b1 == Call a2 b2 =
                  a1 == a2 && b1 == b2
              _ == _ =
                  False
      
         In this program GHC HEAD builds a new SRT for the recursive group of
         `(==)`, `(/=)` and the dictionary closure. Then `/=` points to `==`
         in its SRT field, and `==` uses the SRT object as its SRT. With this
         patch we use the closure for `/=` as the SRT and add `==` there. Then
         `/=` gets an empty SRT field and `==` points to `/=` in its SRT
         field.
      
         This change looks fine to me.
      
         Main.o gets 0.07% larger, executable sizes are identical.
      
      head.hackage
      ------------
      
      head.hackage's CI script builds 428 packages from Hackage using this
      patch with no failures.
      
      Compiler performance
      --------------------
      
      The compiler perf tests report that the compiler allocates slightly more
      (worst case observed so far is 4%). However most programs in the test
      suite are small, single file programs. To benchmark compiler performance
      on something more realistic I build Cabal (the library, 236 modules)
      with different optimisation levels. For the "max residency" row I run
      GHC with `+RTS -s -A100k -i0 -h` for more accurate numbers. Other rows
      are generated with just `-s`. (This is because `-i0` causes running GC
      much more frequently and as a result "bytes copied" gets inflated by
      more than 25x in some cases)
      
      * -O0
      
      |                 | GHC HEAD       | This MR        | Diff   |
      | --------------- | -------------- | -------------- | ------ |
      | Bytes allocated | 54,413,350,872 | 54,701,099,464 | +0.52% |
      | Bytes copied    |  4,926,037,184 |  4,990,638,760 | +1.31% |
      | Max residency   |    421,225,624 |    424,324,264 | +0.73% |
      
      * -O1
      
      |                 | GHC HEAD        | This MR         | Diff   |
      | --------------- | --------------- | --------------- | ------ |
      | Bytes allocated | 245,849,209,992 | 246,562,088,672 | +0.28% |
      | Bytes copied    |  26,943,452,560 |  27,089,972,296 | +0.54% |
      | Max residency   |     982,643,440 |     991,663,432 | +0.91% |
      
      * -O2
      
      |                 | GHC HEAD        | This MR         | Diff   |
      | --------------- | --------------- | --------------- | ------ |
      | Bytes allocated | 291,044,511,408 | 291,863,910,912 | +0.28% |
      | Bytes copied    |  37,044,237,616 |  36,121,690,472 | -2.49% |
      | Max residency   |   1,071,600,328 |   1,086,396,256 | +1.38% |
      
      Extra compiler allocations
      --------------------------
      
      Runtime allocations of programs are as reported above (NoFib section).
      
      The compiler now allocates more than before. Main source of allocation
      in this patch compared to base commit is the new SRT algorithm
      (GHC.Cmm.Info.Build). Below is some of the extra work we do with this
      patch, numbers generated by profiled stage 2 compiler when building a
      pathological case (the test 'ManyConstructors') with '-O2':
      
      - We now sort the final STG for a module, which means traversing the
        entire program, generating free variable set for each top-level
        binding, doing SCC analysis, and re-ordering the program. In
        ManyConstructors this step allocates 97,889,952 bytes.
      
      - We now do SRT analysis on static data, which in a program like
        ManyConstructors causes analysing 10,000 bindings that we would
        previously just skip. This step allocates 70,898,352 bytes.
      
      - We now maintain an SRT map for the entire module as we compile Cmm
        groups:
      
            data ModuleSRTInfo = ModuleSRTInfo
              { ...
              , moduleSRTMap :: SRTMap
              }
      
         (SRTMap is just a strict Map from the 'containers' library)
      
         This map gets an entry for most bindings in a module (exceptions are
         THUNKs and CAFFY static functions). For ManyConstructors this map
         gets 50015 entries.
      
      - Once we're done with code generation we generate a NameSet from SRTMap
        for the non-CAFFY names in the current module. This set gets the same
        number of entries as the SRTMap.
      
      - Finally we update CafInfos in ModDetails for the non-CAFFY Ids, using
        the NameSet generated in the previous step. This usually does the
        least amount of allocation among the work listed here.
      
      Only place with this patch where we do less work in the CAF analysis in
      the tidying pass (CoreTidy). However that doesn't save us much, as the
      pass still needs to traverse the whole program and update IdInfos for
      other reasons. Only thing we don't here do is the `hasCafRefs` pass over
      the RHS of bindings, which is a stateless pass that returns a boolean
      value, so it doesn't allocate much.
      
      (Metric changes blow are all increased allocations)
      
      Metric changes
      --------------
      
      Metric Increase:
          ManyAlternatives
          ManyConstructors
          T13035
          T14683
          T1969
          T9961
      c846618a
  5. 25 Jan, 2020 1 commit
  6. 04 Jan, 2020 1 commit
  7. 03 Dec, 2019 1 commit
  8. 02 Dec, 2019 1 commit
  9. 28 Nov, 2019 1 commit
  10. 05 Oct, 2019 1 commit
    • John Ericson's avatar
      Clean up `#include`s in the compiler · ee8118ca
      John Ericson authored
       - Remove unneeded ones
      
       - Use <..> for inter-package.
         Besides general clean up, helps distinguish between the RTS we link
         against vs the RTS we compile for.
      ee8118ca
  11. 13 Sep, 2019 1 commit
  12. 09 Sep, 2019 1 commit
    • Sylvain Henry's avatar
      Module hierarchy: StgToCmm (#13009) · 447864a9
      Sylvain Henry authored
      Add StgToCmm module hierarchy. Platform modules that are used in several
      other places (NCG, LLVM codegen, Cmm transformations) are put into
      GHC.Platform.
      447864a9
  13. 16 Jul, 2019 1 commit
  14. 03 Jul, 2019 1 commit
  15. 28 Jun, 2019 1 commit
    • Travis Whitaker's avatar
      Correct closure observation, construction, and mutation on weak memory machines. · 11bac115
      Travis Whitaker authored
      Here the following changes are introduced:
          - A read barrier machine op is added to Cmm.
          - The order in which a closure's fields are read and written is changed.
          - Memory barriers are added to RTS code to ensure correctness on
            out-or-order machines with weak memory ordering.
      
      Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this
      is lowered to an instruction that ensures memory reads that occur after said
      instruction in program order are not performed before reads coming before said
      instruction in program order. On machines with strong memory ordering properties
      (e.g. X86, SPARC in TSO mode) no such instruction is necessary, so
      MO_ReadBarrier is simply erased. However, such an instruction is necessary on
      weakly ordered machines, e.g. ARM and PowerPC.
      
      Weam memory ordering has consequences for how closures are observed and mutated.
      For example, consider a closure that needs to be updated to an indirection. In
      order for the indirection to be safe for concurrent observers to enter, said
      observers must read the indirection's info table before they read the
      indirectee. Furthermore, the entering observer makes assumptions about the
      closure based on its info table contents, e.g. an INFO_TYPE of IND imples the
      closure has an indirectee pointer that is safe to follow.
      
      When a closure is updated with an indirection, both its info table and its
      indirectee must be written. With weak memory ordering, these two writes can be
      arbitrarily reordered, and perhaps even interleaved with other threads' reads
      and writes (in the absence of memory barrier instructions). Consider this
      example of a bad reordering:
      
      - An updater writes to a closure's info table (INFO_TYPE is now IND).
      - A concurrent observer branches upon reading the closure's INFO_TYPE as IND.
      - A concurrent observer reads the closure's indirectee and enters it. (!!!)
      - An updater writes the closure's indirectee.
      
      Here the update to the indirectee comes too late and the concurrent observer has
      jumped off into the abyss. Speculative execution can also cause us issues,
      consider:
      
      - An observer is about to case on a value in closure's info table.
      - The observer speculatively reads one or more of closure's fields.
      - An updater writes to closure's info table.
      - The observer takes a branch based on the new info table value, but with the
        old closure fields!
      - The updater writes to the closure's other fields, but its too late.
      
      Because of these effects, reads and writes to a closure's info table must be
      ordered carefully with respect to reads and writes to the closure's other
      fields, and memory barriers must be placed to ensure that reads and writes occur
      in program order. Specifically, updates to a closure must follow the following
      pattern:
      
      - Update the closure's (non-info table) fields.
      - Write barrier.
      - Update the closure's info table.
      
      Observing a closure's fields must follow the following pattern:
      
      - Read the closure's info pointer.
      - Read barrier.
      - Read the closure's (non-info table) fields.
      
      This patch updates RTS code to obey this pattern. This should fix long-standing
      SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting
      out-of-order execution) and PowerPC. This fixes issue #15449.
      Co-Authored-By: Ben Gamari's avatarBen Gamari <ben@well-typed.com>
      11bac115
  16. 20 Jun, 2019 1 commit
    • John Ericson's avatar
      Move 'Platform' to ghc-boot · bff2f24b
      John Ericson authored
      ghc-pkg needs to be aware of platforms so it can figure out which
      subdire within the user package db to use. This is admittedly
      roundabout, but maybe Cabal could use the same notion of a platform as
      GHC to good affect too.
      bff2f24b
  17. 09 Jun, 2019 1 commit
  18. 31 May, 2019 1 commit
    • Sergei Trofimovich's avatar
      powerpc32: fix 64-bit comparison (#16465) · 973077ac
      Sergei Trofimovich authored
      On powerpc32 64-bit comparison code generated dangling
      target labels. This caused ghc build failure as:
      
          $ ./configure --target=powerpc-unknown-linux-gnu && make
          ...
          SCCs aren't in reverse dependent order
          bad blockId n3U
      
      This happened because condIntCode' in PPC codegen generated
      label name but did not place the label into `cmp_lo` code block.
      
      The change adds the `cmp_lo` label into the case of negative
      comparison.
      Signed-off-by: default avatarSergei Trofimovich <slyfox@gentoo.org>
      973077ac
  19. 11 Apr, 2019 1 commit
    • Carter Schonwald's avatar
      removing x87 register support from native code gen · 42504f4a
      Carter Schonwald authored
      * simplifies registers to have GPR, Float and Double, by removing the SSE2 and X87 Constructors
      * makes -msse2 assumed/default for x86 platforms, fixing a long standing nondeterminism in rounding
      behavior in 32bit haskell code
      * removes the 80bit floating point representation from the supported float sizes
      * theres still 1 tiny bit of x87 support needed,
      for handling float and double return values in FFI calls  wrt the C ABI on x86_32,
      but this one piece does not leak into the rest of NCG.
      * Lots of code thats not been touched in a long time got deleted as a
      consequence of all of this
      
      all in all, this change paves the way towards a lot of future further
      improvements in how GHC handles floating point computations, along with
      making the native code gen more accessible to a larger pool of contributors.
      42504f4a
  20. 01 Apr, 2019 1 commit
  21. 15 Mar, 2019 1 commit
    • Peter Trommler's avatar
      PPC NCG: Use liveness information in CmmCall · 83e09d3c
      Peter Trommler authored
      We make liveness information for global registers
      available on `JMP` and `BCTR`, which were the last instructions
      missing. With complete liveness information we do not need to
      reserve global registers in `freeReg` anymore. Moreover we
      assign R9 and R10 to callee saves registers.
      
      Cleanup by removing `Reg_Su`, which was unused, from `freeReg`
      and removing unused register definitions.
      
      The calculation of the number of floating point registers is too
      conservative. Just follow X86 and specify the constants directly.
      
      Overall on PowerPC this results in 0.3 % smaller code size in nofib
      while runtime is slightly better in some tests.
      83e09d3c
  22. 31 Jan, 2019 1 commit
  23. 17 Jan, 2019 3 commits
  24. 01 Jan, 2019 1 commit
    • Peter Trommler's avatar
      PPC NCG: Remove Darwin support · 374e4470
      Peter Trommler authored
      Support for Mac OS X on PowerPC has been dropped by Apple years ago. We
      follow suit and remove PowerPC support for Darwin.
      
      Fixes #16106.
      374e4470
  25. 30 Dec, 2018 1 commit
  26. 11 Dec, 2018 2 commits
  27. 17 Nov, 2018 1 commit
    • Andreas Klebinger's avatar
      NCG: New code layout algorithm. · 912fd2b6
      Andreas Klebinger authored
      Summary:
      This patch implements a new code layout algorithm.
      It has been tested for x86 and is disabled on other platforms.
      
      Performance varies slightly be CPU/Machine but in general seems to be better
      by around 2%.
      Nofib shows only small differences of about +/- ~0.5% overall depending on
      flags/machine performance in other benchmarks improved significantly.
      
      Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec,
      containers, text and xeno.
      
      While the magnitude of gains differed three different CPUs where tested with
      all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell,
      Skylake
      
      * Library benchmark results summarized:
        * containers: ~1.5% faster
        * aeson: ~2% faster
        * megaparsec: ~2-5% faster
        * xml library benchmarks: 0.2%-1.1% faster
        * vector-benchmarks: 1-4% faster
        * text: 5.5% faster
      
      On average GHC compile times go down, as GHC compiled with the new layout
      is faster than the overhead introduced by using the new layout algorithm,
      
      Things this patch does:
      
      * Move code responsilbe for block layout in it's own module.
      * Move the NcgImpl Class into the NCGMonad module.
      * Extract a control flow graph from the input cmm.
      * Update this cfg to keep it in sync with changes during
        asm codegen. This has been tested on x64 but should work on x86.
        Other platforms still use the old codelayout.
      * Assign weights to the edges in the CFG based on type and limited static
        analysis which are then used for block layout.
      * Once we have the final code layout eliminate some redundant jumps.
      
        In particular turn a sequences of:
            jne .foo
            jmp .bar
          foo:
        into
            je bar
          foo:
            ..
      
      Test Plan: ci
      
      Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott
      
      Reviewed By: RyanGlScott
      
      Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton
      
      GHC Trac Issues: #15124
      
      Differential Revision: https://phabricator.haskell.org/D4726
      912fd2b6
  28. 21 Aug, 2018 1 commit
  29. 16 May, 2018 1 commit
    • Simon Marlow's avatar
      Allow CmmLabelDiffOff with different widths · fbd28e2c
      Simon Marlow authored
      Summary:
      This change makes it possible to generate a static 32-bit relative label
      offset on x86_64. Currently we can only generate word-sized label
      offsets.
      
      This will be used in D4634 to shrink info tables.  See D4632 for more
      details.
      
      Test Plan: See D4632
      
      Reviewers: bgamari, niteria, michalt, erikd, jrtc27, osa1
      
      Subscribers: thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4633
      fbd28e2c
  30. 05 May, 2018 1 commit
    • Sebastian Graf's avatar
      Add 'addWordC#' PrimOp · 6243bba7
      Sebastian Graf authored
      This is mostly for congruence with 'subWordC#' and '{add,sub}IntC#'.
      I found 'plusWord2#' while implementing this, which both lacks
      documentation and has a slightly different specification than
      'addWordC#', which means the generic implementation is unnecessarily
      complex.
      
      While I was at it, I also added lacking meta-information on PrimOps
      and refactored 'subWordC#'s generic implementation to be branchless.
      
      Reviewers: bgamari, simonmar, jrtc27, dfeuer
      
      Reviewed By: bgamari, dfeuer
      
      Subscribers: dfeuer, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4592
      6243bba7
  31. 19 Mar, 2018 1 commit
  32. 21 Jan, 2018 1 commit
    • John Ky's avatar
      Add new mbmi and mbmi2 compiler flags · f8557696
      John Ky authored
      This adds support for the bit deposit and extraction operations provided
      by the BMI and BMI2 instruction set extensions on modern amd64 machines.
      
      Implement x86 code generator for pdep and pext.  Properly initialise
      bmiVersion field.
      
      pdep and pext test cases
      
      Fix pattern match for pdep and pext instructions
      
      Fix build of pdep and pext code for 32-bit architectures
      
      Test Plan: Validate
      
      Reviewers: austin, simonmar, bgamari, angerman
      
      Reviewed By: bgamari
      
      Subscribers: trommler, carter, angerman, thomie, rwbarton, newhoggy
      
      GHC Trac Issues: #14206
      
      Differential Revision: https://phabricator.haskell.org/D4236
      f8557696
  33. 22 Nov, 2017 1 commit
  34. 15 Nov, 2017 1 commit
    • John Ky's avatar
      Add new mbmi and mbmi2 compiler flags · f5dc8ccc
      John Ky authored
      This adds support for the bit deposit and extraction operations provided
      by the BMI and BMI2 instruction set extensions on modern amd64 machines.
      
      Test Plan: Validate
      
      Reviewers: austin, simonmar, bgamari, hvr, goldfire, erikd
      
      Reviewed By: bgamari
      
      Subscribers: goldfire, erikd, trommler, newhoggy, rwbarton, thomie
      
      GHC Trac Issues: #14206
      
      Differential Revision: https://phabricator.haskell.org/D4063
      f5dc8ccc
  35. 09 Nov, 2017 1 commit
    • Peter Trommler's avatar
      Fix PPC NCG after blockID patch · f8e7fece
      Peter Trommler authored
      Commit rGHC8b007ab assigns the same label to the first basic block
      of a proc and to the proc entry point. This violates the PPC 64-bit ELF
      v. 1.9 and v. 2.0 ABIs and leads to duplicate symbols.
      
      This patch fixes duplicate symbols caused by block labels
      
      In commit rGHCd7b8da1 an info table label is generated from a block id.
      Getting the entry label from that info label leads to an undefined
      symbol because a suffix "_entry" that is not present in the block label.
      
      To fix that issue add a new info table label flavour for labels
      derived from block ids. Converting such a label with toEntryLabel
      produces the original block label.
      
      Fixes #14311
      
      Test Plan: ./validate
      
      Reviewers: austin, bgamari, simonmar, erikd, hvr, angerman
      
      Reviewed By: bgamari
      
      Subscribers: rwbarton, thomie
      
      GHC Trac Issues: #14311
      
      Differential Revision: https://phabricator.haskell.org/D4149
      f8e7fece
  36. 02 Nov, 2017 1 commit
    • Peter Trommler's avatar
      PPC NCG: Impl branch prediction, atomic ops. · 1130c67b
      Peter Trommler authored
      Implement AtomicRMW ops, atomic read, atomic write
      in PowerPC native code generator. Also implement
      branch prediction because we need it in atomic ops
      anyway.
      
      This patch improves the issue in #12537 a bit but
      does not fix it entirely.
      
      The fallback operations for atomicread and atomicwrite
      in libraries/ghc-prim/cbits/atomic.c are incorrect.
      This patch avoids those functions by implementing the
      operations directly in the native code generator. This
      is also what the x86/amd64 NCG and the LLVM backend
      do.
      
      Test Plan: validate on AIX and PowerPC (32-bit) Linux
      
      Reviewers: erikd, hvr, austin, bgamari, simonmar
      
      Reviewed By: hvr, bgamari
      
      Subscribers: rwbarton, thomie
      
      GHC Trac Issues: #12537
      
      Differential Revision: https://phabricator.haskell.org/D3984
      1130c67b
  37. 30 Oct, 2017 1 commit