1. 27 Jul, 2016 1 commit
  2. 26 Jul, 2016 1 commit
  3. 22 Jul, 2016 2 commits
    • Erik de Castro Lopo's avatar
      Fix the non-Linux build · d068220f
      Erik de Castro Lopo authored
      Summary:
      The recent Compact Regions commit (cf989ffe) builds fine on Linux
      but doesn't build on OS X r Windows.
      
      * rts/sm/CNF.c: Drop un-needed #includes.
      * Fix parenthesis usage with CPP ASSERT macro.
      * Fix format string in debugBelch messages.
      * Use stg_max() instead hand rolled inline max() function.
      
      Test Plan: Build on Linux, OS X and Windows
      
      Reviewers: gcampax, simonmar, austin, bgamari
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2421
      d068220f
    • Ben Gamari's avatar
      Revert "Cleanup PosixSource.h" · fb34b27c
      Ben Gamari authored
      This reverts commit cac3fb06.
      
      This breaks OS X and Windows.
      fb34b27c
  4. 21 Jul, 2016 1 commit
    • Ömer Sinan Ağacan's avatar
      Implement unboxed sum primitive type · 714bebff
      Ömer Sinan Ağacan authored
      Summary:
      This patch implements primitive unboxed sum types, as described in
      https://ghc.haskell.org/trac/ghc/wiki/UnpackedSumTypes.
      
      Main changes are:
      
      - Add new syntax for unboxed sums types, terms and patterns. Hidden
        behind `-XUnboxedSums`.
      
      - Add unlifted unboxed sum type constructors and data constructors,
        extend type and pattern checkers and desugarer.
      
      - Add new RuntimeRep for unboxed sums.
      
      - Extend unarise pass to translate unboxed sums to unboxed tuples right
        before code generation.
      
      - Add `StgRubbishArg` to `StgArg`, and a new type `CmmArg` for better
        code generation when sum values are involved.
      
      - Add user manual section for unboxed sums.
      
      Some other changes:
      
      - Generalize `UbxTupleRep` to `MultiRep` and `UbxTupAlt` to
        `MultiValAlt` to be able to use those with both sums and tuples.
      
      - Don't use `tyConPrimRep` in `isVoidTy`: `tyConPrimRep` is really
        wrong, given an `Any` `TyCon`, there's no way to tell what its kind
        is, but `kindPrimRep` and in turn `tyConPrimRep` returns `PtrRep`.
      
      - Fix some bugs on the way: #12375.
      
      Not included in this patch:
      
      - Update Haddock for new the new unboxed sum syntax.
      
      - `TemplateHaskell` support is left as future work.
      
      For reviewers:
      
      - Front-end code is mostly trivial and adapted from unboxed tuple code
        for type checking, pattern checking, renaming, desugaring etc.
      
      - Main translation routines are in `RepType` and `UnariseStg`.
        Documentation in `UnariseStg` should be enough for understanding
        what's going on.
      
      Credits:
      
      - Johan Tibell wrote the initial front-end and interface file
        extensions.
      
      - Simon Peyton Jones reviewed this patch many times, wrote some code,
        and helped with debugging.
      
      Reviewers: bgamari, alanz, goldfire, RyanGlScott, simonpj, austin,
                 simonmar, hvr, erikd
      
      Reviewed By: simonpj
      
      Subscribers: Iceland_jack, ggreif, ezyang, RyanGlScott, goldfire,
                   thomie, mpickering
      
      Differential Revision: https://phabricator.haskell.org/D2259
      714bebff
  5. 20 Jul, 2016 2 commits
    • gcampax's avatar
      Compact Regions · cf989ffe
      gcampax authored
      This brings in initial support for compact regions, as described in the
      ICFP 2015 paper "Efficient Communication and Collection with Compact
      Normal Forms" (Edward Z. Yang et.al.) and implemented by Giovanni
      Campagna.
      
      Some things may change before the 8.2 release, but I (Simon M.) wanted
      to get the main patch committed so that we can iterate.
      
      What documentation there is is in the Data.Compact module in the new
      compact package.  We'll need to extend and polish the documentation
      before the release.
      
      Test Plan:
      validate
      (new test cases included)
      
      Reviewers: ezyang, simonmar, hvr, bgamari, austin
      
      Subscribers: vikraman, Yuras, RyanGlScott, qnikst, mboes, facundominguez, rrnewton, thomie, erikd
      
      Differential Revision: https://phabricator.haskell.org/D1264
      
      GHC Trac Issues: #11493
      cf989ffe
    • Moritz Angermann's avatar
      Cleanup PosixSource.h · cac3fb06
      Moritz Angermann authored
      When trying to build arm64-apple-iso, the build fell over
      `strdup`, as the arm64-apple-ios build did not fall into `darwin_HOST_OS`,
      and would need `ios_HOST_OS`.
      
      This diff tries to clean up PosixSource.h, instead of layering another
      define on top.
      
      As we use `strnlen` in sources that include PosixSource.h, and `strnlen`
      is defined in POSIX.1-2008, the `_POSIX_C_SOURCE` and `_XOPEN_SOURCE`
      are increased accordingly.
      
      Furthermore the `_DARWIN_C_SOURCE` (required for `u_char`, etc. used in
      sysctl.h) define is moved into `OSThreads.h` alongside a similar ifdef
      for freebsd.
      
      Test Plan: Build on all supported platforms.
      
      Reviewers: rwbarton, erikd, austin, hvr, simonmar, bgamari
      
      Reviewed By: simonmar, bgamari
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2375
      cac3fb06
  6. 16 Jul, 2016 1 commit
  7. 05 Jul, 2016 4 commits
  8. 04 Jul, 2016 1 commit
  9. 01 Jul, 2016 2 commits
  10. 27 Jun, 2016 1 commit
  11. 26 Jun, 2016 1 commit
  12. 22 Jun, 2016 1 commit
  13. 20 Jun, 2016 1 commit
  14. 17 Jun, 2016 1 commit
    • Simon Marlow's avatar
      NUMA cleanups · 498ed266
      Simon Marlow authored
      - Move the numaMap and nNumaNodes out of RtsFlags to Capability.c
      - Add a test to tests/rts
      498ed266
  15. 14 Jun, 2016 1 commit
  16. 13 Jun, 2016 2 commits
    • Tamar Christina's avatar
      Add thin library support to Windows too · 5cee88d7
      Tamar Christina authored
      Summary:
      Code already existed in the RTS to add thin library support for non-Windows
      operating systems. This adds it to Windows as well.
      
      ar thin libraries have the exact same format as normal archives except they
      have a different magic string and they don't copy the object files into the
      archive.
      
      Instead each header entry points to the location of the object file on disk.
      This is useful when a library is only created to satisfy a compile time dependency
      instead of to be distributed. This saves the time required for copying.
      
      Test Plan: ./validate and new test T11788
      
      Reviewers: austin, bgamari, simonmar, erikd
      
      Reviewed By: bgamari, simonmar
      
      Subscribers: thomie, #ghc_windows_task_force
      
      Differential Revision: https://phabricator.haskell.org/D2323
      
      GHC Trac Issues: #11788
      5cee88d7
    • Erik de Castro Lopo's avatar
      rts: Fix NUMA when cross compiling · 2bb6ba62
      Erik de Castro Lopo authored
      The NUMA code was enabled whenever numa.h and numaif.h are detected.
      Unfortunately, the hosts' header files were being detected even then
      cross compiling in the absence of a target libnuma.
      
      Fix that by relying on the the presence of libnuma instead of the
      presence of the header files. The test for libnuma does `AC_TRY_LINK`
      which will fail if the test program (compiled for the target) can't
      be linked against libnuma.
      
      Test Plan:
      Build on x86_64/linux and make sure NUMA works and cross compile to
      armhf/linux.
      
      Reviewers: austin, bgamari, hvr, simonmar
      
      Reviewed By: simonmar
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2329
      2bb6ba62
  17. 12 Jun, 2016 2 commits
    • Tamar Christina's avatar
      Fix incorrect calculated relocations on Windows x86_64 · b40e1b4c
      Tamar Christina authored
      Summary:
      See #12031 for analysis, but essentially what happens is:
      
      To sum up the issue, the reason this seems to go wrong is because
      of how we initialize the `.bss` section for Windows in the runtime linker.
      
      The first issue is where we calculate the zero space for the section:
      
      ```
      zspace = stgCallocBytes(1, bss_sz, "ocGetNames_PEi386(anonymous bss)");
      sectab_i->PointerToRawData = ((UChar*)zspace) - ((UChar*)(oc->image));
      ```
      
      Where
      ```
      UInt32 PointerToRawData;
      ```
      
      This means we're stuffing a `64-bit` value into a `32-bit` one. Also `zspace`
      can be larger than `oc->image`. In which case it'll overflow and
      then get truncated in the cast.
      
      The address of a value in the `.bss` section is then calculated as:
      
      ```
      addr = ((UChar*)(oc->image))
           + (sectabent->PointerToRawData
           + symtab_i->Value);
      ```
      
      If it does truncate then this calculation won't be correct (which is what is happening).
      
      We then later use the value of `addr` as the `S` (Symbol) value for the relocations
      
      ```
      S = (size_t) lookupSymbol_( (char*)symbol );
      ```
      
      Now the majority of the relocations are `R_X86_64_PC32` etc.
      e.g. They are guaranteed to fit in a `32-bit` value.
      
      The `R_X86_64_64` introduced for these pseudo-relocations so they can use
      the full `48-bit` addressing space isn't as lucky.
      As for why it sometimes work has to do on whether the value is truncated or not.
      
      `PointerToRawData` can't be changed because it's size is fixed by the PE specification.
      
      Instead just like with the other platforms, we now use `section` on Windows as well.
      This gives us a `start` parameter of type `void*` which solves the issue.
      
      This refactors the code to use `section.start` and to fix the issues.
      
      Test Plan: ./validate and new test added T12031
      
      Reviewers: RyanGlScott, erikd, bgamari, austin, simonmar
      
      Reviewed By: simonmar
      
      Subscribers: thomie, #ghc_windows_task_force
      
      Differential Revision: https://phabricator.haskell.org/D2316
      
      GHC Trac Issues: #12031, #11317
      b40e1b4c
    • Erik de Castro Lopo's avatar
      rts: Fix build when USE_LARGE_ADDRESS_SPACE is undefined · 6ace660a
      Erik de Castro Lopo authored
      The recently added NUMA related functions were mistakenly defined
      within a `#ifdef USE_LARGE_ADDRESS_SPACE` ... `#endif` block. Moving
      them outside this block fixes the build on PowerPC and Arm Linux.
      
      Test Plan: Build on PowerPC or Arm Linux
      
      Reviewers: hvr, austin, bgamari, simonmar
      
      Reviewed By: simonmar
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2326
      6ace660a
  18. 10 Jun, 2016 4 commits
    • Simon Marlow's avatar
      Rts flags cleanup · c88f31a0
      Simon Marlow authored
      * Remove unused/old flags from the structs
      * Update old comments
      * Add missing flags to GHC.RTS
      * Simplify GHC.RTS, remove C code and use hsc2hs instead
      * Make ParFlags unconditional, and add support to GHC.RTS
      c88f31a0
    • Simon Marlow's avatar
      NUMA support · 9e5ea67e
      Simon Marlow authored
      Summary:
      The aim here is to reduce the number of remote memory accesses on
      systems with a NUMA memory architecture, typically multi-socket servers.
      
      Linux provides a NUMA API for doing two things:
      * Allocating memory local to a particular node
      * Binding a thread to a particular node
      
      When given the +RTS --numa flag, the runtime will
      * Determine the number of NUMA nodes (N) by querying the OS
      * Assign capabilities to nodes, so cap C is on node C%N
      * Bind worker threads on a capability to the correct node
      * Keep a separate free lists in the block layer for each node
      * Allocate the nursery for a capability from node-local memory
      * Allocate blocks in the GC from node-local memory
      
      For example, using nofib/parallel/queens on a 24-core 2-socket machine:
      
      ```
      $ ./Main 15 +RTS -N24 -s -A64m
        Total   time  173.960s  (  7.467s elapsed)
      
      $ ./Main 15 +RTS -N24 -s -A64m --numa
        Total   time  150.836s  (  6.423s elapsed)
      ```
      
      The biggest win here is expected to be allocating from node-local
      memory, so that means programs using a large -A value (as here).
      
      According to perf, on this program the number of remote memory accesses
      were reduced by more than 50% by using `--numa`.
      
      Test Plan:
      * validate
      * There's a new flag --debug-numa=<n> that pretends to do NUMA without
        actually making the OS calls, which is useful for testing the code
        on non-NUMA systems.
      * TODO: I need to add some unit tests
      
      Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2199
      9e5ea67e
    • Ömer Sinan Ağacan's avatar
      Remove Printer.c:prettyPrintClosure() · 47d81732
      Ömer Sinan Ağacan authored
      It turns out this function was unused and broken for a long time (fixed
      with b0a76643). Removing it as it will probably get broken again in the
      future and it's unused.
      
      Reviewers: austin, erikd, simonmar, nomeata, bgamari
      
      Reviewed By: nomeata, bgamari
      
      Subscribers: Phyx, thomie, nomeata
      
      Differential Revision: https://phabricator.haskell.org/D2322
      47d81732
    • Ömer Sinan Ağacan's avatar
      prettyPrintClosure(): Untag the closure before accessing fields · b0a76643
      Ömer Sinan Ağacan authored
      (This fixes segfaults)
      b0a76643
  19. 08 Jun, 2016 1 commit
    • Ömer Sinan Ağacan's avatar
      Show sources of cost centers in .prof · d7933cbc
      Ömer Sinan Ağacan authored
      This fixes the problem with duplicate cost-centre names that was
      reported a couple of times before. When a module implements a typeclass
      multiple times for different types, methods of different implementations
      get same cost-centre names and are reported like this:
      
          COST CENTRE MODULE            %time %alloc
      
          CAF         GHC.IO.Handle.FD    0.0   32.8
          CAF         GHC.Read            0.0    1.0
          CAF         GHC.IO.Encoding     0.0    1.8
          showsPrec   Main                0.0    1.2
          readPrec    Main                0.0   19.4
          readPrec    Main                0.0   20.5
          main        Main                0.0   20.2
      
                                                  individual      inherited
          COST CENTRE  MODULE  no.     entries  %time %alloc   %time %alloc
      
          MAIN         MAIN     53          0    0.0    0.2     0.0  100.0
           CAF         Main    105          0    0.0    0.3     0.0   62.5
            readPrec   Main    109          1    0.0    0.6     0.0    0.6
            readPrec   Main    107          1    0.0    0.6     0.0    0.6
            main       Main    106          1    0.0   20.2     0.0   61.0
             ==        Main    114          1    0.0    0.0     0.0    0.0
             ==        Main    113          1    0.0    0.0     0.0    0.0
             showsPrec Main    112          2    0.0    1.2     0.0    1.2
             showsPrec Main    111          2    0.0    0.9     0.0    0.9
             readPrec  Main    110          0    0.0   18.8     0.0   18.8
             readPrec  Main    108          0    0.0   19.9     0.0   19.9
      
      It's not possible to tell from the report which `==` took how long. This
      patch adds one more column at the cost of making outputs wider. The
      report now looks like this:
      
          COST CENTRE MODULE           SRC                       %time %alloc
      
          CAF         GHC.IO.Handle.FD <entire-module>             0.0   32.9
          CAF         GHC.IO.Encoding  <entire-module>             0.0    1.8
          CAF         GHC.Read         <entire-module>             0.0    1.0
          showsPrec   Main             Main_1.hs:7:19-22           0.0    1.2
          readPrec    Main             Main_1.hs:7:13-16           0.0   19.5
          readPrec    Main             Main_1.hs:4:13-16           0.0   20.5
          main        Main             Main_1.hs:(10,1)-(20,20)    0.0   20.2
      
                                                                             individual      inherited
          COST CENTRE  MODULE        SRC                      no. entries  %time %alloc   %time %alloc
      
          MAIN         MAIN          <built-in>                53      0    0.0    0.2     0.0  100.0
           CAF         Main          <entire-module>          105      0    0.0    0.3     0.0   62.5
            readPrec   Main          Main_1.hs:7:13-16        109      1    0.0    0.6     0.0    0.6
            readPrec   Main          Main_1.hs:4:13-16        107      1    0.0    0.6     0.0    0.6
            main       Main          Main_1.hs:(10,1)-(20,20) 106      1    0.0   20.2     0.0   61.0
             ==        Main          Main_1.hs:7:25-26        114      1    0.0    0.0     0.0    0.0
             ==        Main          Main_1.hs:4:25-26        113      1    0.0    0.0     0.0    0.0
             showsPrec Main          Main_1.hs:7:19-22        112      2    0.0    1.2     0.0    1.2
             showsPrec Main          Main_1.hs:4:19-22        111      2    0.0    0.9     0.0    0.9
             readPrec  Main          Main_1.hs:7:13-16        110      0    0.0   18.8     0.0   18.8
             readPrec  Main          Main_1.hs:4:13-16        108      0    0.0   19.9     0.0   19.9
           CAF         Text.Read.Lex <entire-module>          102      0    0.0    0.5     0.0    0.5
      
      To fix failing test cases because of different orderings of cost centres
      (e.g. optimized and non-optimized build printing in different order),
      with this patch we also start sorting cost centres before printing. The
      order depends on 1) entries (more entered cost centres come first) 2)
      names (using strcmp() on cost centre names).
      
      Reviewers: simonmar, austin, erikd, bgamari
      
      Reviewed By: simonmar, bgamari
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2282
      
      GHC Trac Issues: #11543, #8473, #7105
      d7933cbc
  20. 05 Jun, 2016 1 commit
  21. 04 Jun, 2016 2 commits
  22. 03 Jun, 2016 2 commits
    • Tamar Christina's avatar
      Use useful names for Symbol Addr and Names in Linker.c · 079c1b8c
      Tamar Christina authored
      Replace `char*` and `void*` with `SymbolName` and `SymbolAddr` in
      `Linker.c`. Gives some useful information about what the variables are
      used for and also normalizes the types used in Mac, Linux and Windows
      
      Test Plan:
      ./validate on all platforms that use the runtime linker.
      
      For unix platforms please ensure `DYNAMIC_GHC_PROGRAMS=NO` is
       added to your validate file.
      
      This is a continuation from D2184
      
      Reviewers: austin, erikd, simonmar, bgamari
      
      Reviewed By: bgamari
      
      Subscribers: thomie, #ghc_windows_task_force
      
      Differential Revision: https://phabricator.haskell.org/D2250
      
      GHC Trac Issues: #11816
      079c1b8c
    • Tamar Christina's avatar
      Refactored SymbolInfo to lower memory usage in RTS · 37473722
      Tamar Christina authored
      Previously as part of #11223 a new struct `SymbolInfo` was introduced to
      keep track it the weak symbol status of a symbol.
      
      This structure also kept a copy of the calculated address of the symbol
      which turns out was useful in ignoring non-weak zero-valued symbols.
      
      The information was kept in an array so it means for every symbol two
      extra bytes were kept even though the vast majority of symbols are
      non-weak and non-zero valued.
      
      This changes the array into a sparse map keeping this information only
      for the symbols that are weak or zero-valued. This allows for a
      reduction in the amount of information needed to be kept while giving up
      a small (negligable) hit in performance as this information now has to
      be looked up in hashmaps.
      
      Test Plan: ./validate on all platforms that use the runtime linker.
      
      For unix platforms please ensure `DYNAMIC_GHC_PROGRAMS=NO` is added to
      your validate file.
      
      Reviewers: simonmar, austin, erikd, bgamari
      
      Reviewed By: simonmar, bgamari
      
      Subscribers: thomie, #ghc_windows_task_force
      
      Differential Revision: https://phabricator.haskell.org/D2184
      
      GHC Trac Issues: #11816
      37473722
  23. 31 May, 2016 1 commit
  24. 28 May, 2016 1 commit
  25. 25 May, 2016 1 commit
  26. 24 May, 2016 1 commit
    • Erik de Castro Lopo's avatar
      Runtime linker: Break m32 allocator out into its own file · fe8a4e5d
      Erik de Castro Lopo authored
      This makes the code a little more modular and allows the removal of some
      CPP hackery. By providing dummy implementations of of the `m32_*`
      functions (which simply call `errorBelch`) it means that the call sites
      for these functions are syntax checked even when `RTS_LINKER_USE_MMAP`
      is `0`.
      
      Also changes some size parameter types from `unsigned int` to `size_t`.
      
      Test Plan: Validate on Linux, OS X and Windows
      
      Reviewers: Phyx, hsyl20, bgamari, simonmar, austin
      
      Reviewed By: simonmar, austin
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2237
      fe8a4e5d
  27. 22 May, 2016 1 commit