1. 13 Jun, 2020 34 commits
  2. 11 Jun, 2020 1 commit
  3. 10 Jun, 2020 5 commits
    • Sylvain Henry's avatar
      test: fix conc038 · 8d07c48c
      Sylvain Henry authored and Marge Bot's avatar Marge Bot committed
      We had spurious failures of conc038 test on CI with stdout:
       newThread started
      -Haskell: 2
       newThread back again
       1 sec later
       shutting down
      +Haskell: 2
    • Roland Senn's avatar
      Initialize the allocation counter in GHCi to 0 (Fixes #16012) · 9b283e1b
      Roland Senn authored and Marge Bot's avatar Marge Bot committed
      According to the documentation for the function `getAllocationCounter` in
      initialize the allocationCounter also in GHCi to 0.
    • Luke Lau's avatar
      Fix lookupGlobalOccRn_maybe sometimes reporting an error · 32fd37f5
      Luke Lau authored and Marge Bot's avatar Marge Bot committed
      In some cases it was possible for lookupGlobalOccRn_maybe to return an
      error, when it should be returning a Nothing. If it called
      lookupExactOcc_either when there were no matching GlobalRdrElts in the
      otherwise case, it would return an error message. This could be caused
      when lookupThName_maybe in Template Haskell was looking in different
      namespaces (thRdrNameGuesses), guessing different namespaces that the
      name wasn't guaranteed to be found in.
      However, by addressing this some more accurate errors were being lost in
      the conversion to Maybes. So some of the lookup* functions have been
      shuffled about so that errors should always be ignored in
      lookup*_maybes, and propagated otherwise.
      This fixes #18263
    • Simon Peyton Jones's avatar
      Implement cast worker/wrapper properly · 6d49d5be
      Simon Peyton Jones authored and Marge Bot's avatar Marge Bot committed
      The cast worker/wrapper transformation transforms
         x = e |> co
         y = e
         x = y |> co
      This is done by the simplifier, but we were being
      careless about transferring IdInfo from x to y,
      and about what to do if x is a NOINLNE function.
      This resulted in a series of bugs:
           #17673, #18093, #18078.
      This patch fixes all that:
      * Main change is in GHC.Core.Opt.Simplify, and
        the new prepareBinding function, which does this
        cast worker/wrapper transform.
        See Note [Cast worker/wrappers].
      * There is quite a bit of refactoring around
        prepareRhs, makeTrivial etc.  It's nicer now.
      * Some wrappers from strictness and cast w/w, notably those for
        a function with a NOINLINE, should inline very late. There
        wasn't really a mechanism for that, which was an existing bug
        really; so I invented a new finalPhase = Phase (-1).  It's used
        for all simplifier runs after the user-visible phase 2,1,0 have
        run.  (No new runs of the simplifier are introduced thereby.)
        See new Note [Compiler phases] in GHC.Types.Basic;
        the main changes are in GHC.Core.Opt.Driver
      * Doing this made me trip over two places where the AnonArgFlag on a
        FunTy was being lost so we could end up with (Num a -> ty)
        rather than (Num a => ty)
          - In coercionLKind/coercionRKind
          - In contHoleType in the Simplifier
        I fixed the former by defining mkFunctionType and using it in
        I could have done the same for the latter, but the information
        is almost to hand.  So I fixed the latter by
          - adding sc_hole_ty to ApplyToVal (like ApplyToTy),
          - adding as_hole_ty to ValArg (like TyArg)
          - adding sc_fun_ty to StrictArg
        Turned out I could then remove ai_type from ArgInfo.  This is
        just moving the deck chairs around, but it worked out nicely.
        See the new Note [AnonArgFlag] in GHC.Types.Var
      * When looking at the 'arity decrease' thing (#18093) I discovered
        that stable unfoldings had a much lower arity than the actual
        optimised function.  That's what led to the arity-decrease
        message.  Simple solution: eta-expand.
        It's described in Note [Eta-expand stable unfoldings]
        in GHC.Core.Opt.Simplify
      * I also discovered that unsafeCoerce wasn't being inlined if
        the context was boring.  So (\x. f (unsafeCoerce x)) would
        create a thunk -- yikes!  I fixed that by making inlineBoringOK
        a bit cleverer: see Note [Inline unsafeCoerce] in GHC.Core.Unfold.
        I also found that unsafeCoerceName was unused, so I removed it.
      I made a test case for #18078, and a very similar one for #17673.
      The net effect of all this on nofib is very modest, but positive:
              Program           Size    Allocs   Runtime   Elapsed  TotalMem
                 anna          -0.4%     -0.1%     -3.1%     -3.1%      0.0%
       fannkuch-redux          -0.4%     -0.3%     -0.1%     -0.1%      0.0%
             maillist          -0.4%     -0.1%     -7.8%     -1.0%    -14.3%
            primetest          -0.4%    -15.6%     -7.1%     -6.6%      0.0%
                  Min          -0.9%    -15.6%    -13.3%    -14.2%    -14.3%
                  Max          -0.3%      0.0%    +12.1%    +12.4%      0.0%
       Geometric Mean          -0.4%     -0.2%     -2.3%     -2.2%     -0.1%
      All following metric decreases are compile-time allocation decreases
      between -1% and -3%:
      Metric Decrease:
    • Simon Peyton Jones's avatar
      Optimisation in Unique.Supply · 9454511b
      Simon Peyton Jones authored and Marge Bot's avatar Marge Bot committed
      This patch switches on -fno-state-hack in GHC.Types.Unique.Supply.
      It turned out that my fixes for #18078 (coercion floating) changed the
      optimisation pathway for mkSplitUniqSupply in such a way that we had
      an extra allocation inside the inner loop.  Adding -fno-state-hack
      fixed that -- and indeed the loop in mkSplitUniqSupply is a classic
      example of the way in which -fno-state-hack can be bad; see #18238.
      Moreover, the new code is better than the old.  They allocate
      the same, but the old code ends up with a partial application.
      The net effect is that the test
      runs 20% faster!   From 2.5s down to 2.0s.  The allocation numbers
      are the same -- but elapsed time falls. Good!
      The bad thing about this is that it's terribly delicate.  But
      at least it's a good example of such delicacy in action.
      There is a long Note [Optimising the unique supply] which now
      explains all this.