Commits (63)
  • Ben Gamari's avatar
    rts/CNF: Fix fixup comparison function · cf4f1e2f
    Ben Gamari authored
    Previously we would implicitly convert the difference between two words
    to an int, resulting in an integer overflow on 64-bit machines.
    Fixes #16992
  • Ömer Sinan Ağacan's avatar
    Pack some of IdInfo fields into a bit field · a03da9bf
    Ömer Sinan Ağacan authored
    This reduces residency of compiler quite a bit on some programs.
    Example stats when building T10370:
       2,871,242,832 bytes allocated in the heap
       4,693,328,008 bytes copied during GC
          33,941,448 bytes maximum residency (276 sample(s))
             375,976 bytes maximum slop
                  83 MiB total memory in use (0 MB lost due to fragmentation)
       2,858,897,344 bytes allocated in the heap
       4,629,255,440 bytes copied during GC
          32,616,624 bytes maximum residency (278 sample(s))
             314,400 bytes maximum slop
                  80 MiB total memory in use (0 MB lost due to fragmentation)
    So -3.9% residency, -1.3% bytes copied and -0.4% allocations.
    Fixes #17497
    Metric Decrease:
  • Ben Gamari's avatar
    get-win32-tarballs: Fix base URL · 670c3e5c
    Ben Gamari authored
    Revert a change previously made for testing purposes.
  • Ben Gamari's avatar
  • Simon Jakobi's avatar
    docs: Add examples for Data.Semigroup.Arg{Min,Max} · 8c0740b7
    Simon Jakobi authored
    Context: #17153
  • Ben Gamari's avatar
    Add few cleanups of the CAF logic · cb22348f
    Ben Gamari authored
    Give the NameSet of non-CAFfy names a proper newtype to distinguish it
    from all of the other NameSets floating about.
  • Emeka Nkurumeh's avatar
  • Sebastian Graf's avatar
    CprAnal: Don't attach CPR sigs to expandable bindings (#18154) · 86d8ac22
    Sebastian Graf authored
    Instead, look through expandable unfoldings in `cprTransform`.
    See the new Note [CPR for expandable unfoldings]:
    Long static data structures (whether top-level or not) like
      xs = x1 : xs1
      xs1 = x2 : xs2
      xs2 = x3 : xs3
    should not get CPR signatures, because they
      * Never get WW'd, so their CPR signature should be irrelevant after analysis
        (in fact the signature might even be harmful for that reason)
      * Would need to be inlined/expanded to see their constructed product
      * Recording CPR on them blows up interface file sizes and is redundant with
        their unfolding. In case of Nested CPR, this blow-up can be quadratic!
    But we can't just stop giving DataCon application bindings the CPR property,
    for example
      fac 0 = 1
      fac n = n * fac (n-1)
    fac certainly has the CPR property and should be WW'd! But FloatOut will
    transform the first clause to
      lvl = 1
      fac 0 = lvl
    If lvl doesn't have the CPR property, fac won't either. But lvl doesn't have a
    CPR signature to extrapolate into a CPR transformer ('cprTransform'). So
    instead we keep on cprAnal'ing through *expandable* unfoldings for these arity
    0 bindings via 'cprExpandUnfolding_maybe'.
    In practice, GHC generates a lot of (nested) TyCon and KindRep bindings, one
    for each data declaration. It's wasteful to attach CPR signatures to each of
    them (and intractable in case of Nested CPR).
    Fixes #18154.
  • Ben Gamari's avatar
    users-guide: Add discussion of shared object naming · e34bf656
    Ben Gamari authored
    Fixes #18074.
  • Ben Gamari's avatar
    testsuite: Print sign of performance changes · 5d0f2445
    Ben Gamari authored
    Executes the minor formatting change in the tabulated performance
    changes suggested in #18135.
  • Ben Gamari's avatar
    testsuite: Add testcase for #18129 · 9e4b981f
    Ben Gamari authored
  • Ivan-Yudin's avatar
    doc: Reformulate the opening paragraph of Ch. 4 in User's guide · 266310c3
    Ivan-Yudin authored
    Removes mentioning of Hugs
    (it is not helpful for new users anymore).
    Changes the wording for the rest of the paragraph.
    Fixes #18132.
  • Icelandjack's avatar
  • Alp Mestanogullari's avatar
  • Artem Pelenitsyn's avatar
    Don't reload environment files on every setSessionDynFlags · d880d6b2
    Artem Pelenitsyn authored
    Makes `interpretPackageEnv` (which loads envirinment files) a part of
    `parseDynamicFlags` (parsing command-line arguments, which is typically
    done once) instead of `setSessionDynFlags` (which is typically called
    several times). Making several (transitive) calls to `interpretPackageEnv`,
    as before, caused #18125 #16318, which should be fixed now.
  • Ryan Scott's avatar
    Factor out HsPatSigType for pat sigs/RULE term sigs (#16762) · 102cfd67
    Ryan Scott authored
    This implements chunks (2) and (3) of
    #16762 (comment 270170). Namely,
    it introduces a dedicated `HsPatSigType` AST type, which represents
    the types that can appear in pattern signatures and term-level `RULE`
    binders. Previously, these were represented with `LHsSigWcType`.
    Although `LHsSigWcType` is isomorphic to `HsPatSigType`, the intended
    semantics of the two types are slightly different, as evidenced by
    the fact that they have different code paths in the renamer and
    See also the new `Note [Pattern signature binders and scoping]` in
  • Hécate's avatar
  • Icelandjack's avatar
  • Takenobu Tani's avatar
    Tweak man page for ghc command · 1a93ea57
    Takenobu Tani authored
    This commit updates the ghc command's man page as followings:
    * Enable `man_show_urls` to show URL addresses in the `DESCRIPTION`
    section of ghc.rst, because sphinx currently removes hyperlinks
    for man pages.
    * Add a `SEE ALSO` section to point to the GHC homepage
  • Takenobu Tani's avatar
    GHCi: Add link to the user's guide in help message · a951e1ba
    Takenobu Tani authored
    This commit adds a link to the user's guide in ghci's
    `:help` message.
    Newcomers could easily reach to details of ghci.
  • Cheah Jer Fei's avatar
    Handle single unused import · 404581ea
    Cheah Jer Fei authored
  • Ben Gamari's avatar
    Ensure that printMinimalImports closes handle · 1c999e5d
    Ben Gamari authored
    Fixes #18166.
  • Ben Gamari's avatar
    hadrian: Tell testsuite driver about LLVM availability · c9f5a8f4
    Ben Gamari authored
    This reflects the logic present in the Make build system into Hadrian.
    Fixes #18167.
  • Simon Jakobi's avatar
    Improve some folds over Uniq[D]FM · c05c0659
    Simon Jakobi authored
    * Replace some non-deterministic lazy folds with
      strict folds.
    * Replace some O(n log n) folds in deterministic order
      with O(n) non-deterministic folds.
    * Replace some folds with set-operations on the underlying
    This reduces max residency when compiling
    `nofib/spectral/simple/Main.hs` with -O0 by about 1%.
    Maximum residency when compiling Cabal also seems reduced on the
    order of 3-9%.
  • Simon Jakobi's avatar
    Use Data.IntMap.disjoint · 477f13bb
    Simon Jakobi authored
    Data.IntMap gained a dedicated `disjoint` function in containers-
    This patch applies this function where appropriate in hopes of modest
    compiler performance improvements.
    Closes #16806.
  • Ben Gamari's avatar
  • Sebastian Graf's avatar
    DmdAnal: Improve handling of precise exceptions · 9bd20e83
    Sebastian Graf authored
    This patch does two things: Fix possible unsoundness in what was called
    the "IO hack" and implement part 2.1 of the "fixing precise exceptions"
    plan in
    which, in combination with !2956, supersedes !3014 and !2525.
    **IO hack**
    The "IO hack" (which is a fallback to preserve precise exceptions
    semantics and thus soundness, rather than some smart thing that
    increases precision) is called `exprMayThrowPreciseException` now.
    I came up with two testcases exemplifying possible unsoundness (if
    twisted enough) in the old approach:
    - `T13380d`: Demonstrating unsoundness of the "IO hack" when resorting
                 to manual state token threading and direct use of primops.
                 More details below.
    - `T13380e`: Demonstrating unsoundness of the "IO hack" when we have
                 Nested CPR. Not currently relevant, as we don't have Nested
                 CPR yet.
    - `T13380f`: Demonstrating unsoundness of the "IO hack" for safe FFI
    Basically, the IO hack assumed that precise exceptions can only be
    thrown from a case scrutinee of type `(# State# RealWorld, _ #)`. I
    couldn't come up with a program using the `IO` abstraction that violates
    this assumption. But it's easy to do so via manual state token threading
    and direct use of primops, see `T13380d`. Also similar code might be
    generated by Nested CPR in the (hopefully not too) distant future, see
    `T13380e`. Hence, we now have a more careful test in `forcesRealWorld`
    that passes `T13380{d,e}` (and will hopefully be robust to Nested CPR).
    **Precise exceptions**
    In #13380 and #17676 we saw that we didn't preserve precise exception
    semantics in demand analysis. We fixed that with minimal changes in
    !2956, but that was terribly unprincipled.
    That unprincipledness resulted in a loss of precision, which is tracked
    by these new test cases:
    - `T13380b`: Regression in dead code elimination, because !2956 was too
                 syntactic about `raiseIO#`
    - `T13380c`: No need to apply the "IO hack" when the IO action may not
                 throw a precise exception (and the existing IO hack doesn't
                 detect that)
    Fixing both issues in !3014 turned out to be too complicated and had
    the potential to regress in the future. Hence we decided to only fix
    `T13380b` and augment the `Divergence` lattice with a new middle-layer
    element, `ExnOrDiv`, which means either `Diverges` (, throws an
    imprecise exception) or throws a *precise* exception.
    See the wiki page on Step 2.1 for more implementational details:
  • Ben Gamari's avatar
    GHC.Cmm.Opt: Handle MO_XX_Conv · 568d7279
    Ben Gamari authored
    This MachOp was introduced by 2c959a18
    but a wildcard match in cmmMachOpFoldM hid the fact that it wasn't
    handled. Ideally we would eliminate the match but this appears to be a
    larger task.
    Fixes #18141.
  • Ryan Scott's avatar
    Remove duplicate Note [When to print foralls] in GHC.Core.TyCo.Ppr · 5bcf8606
    Ryan Scott authored
    There are two different Notes named `[When to print foralls]`. The
    most up-to-date one is in `GHC.Iface.Type`, but there is a second
    one in `GHC.Core.TyCo.Ppr`. The latter is less up-to-date, as it was
    written before GHC switched over to using ifaces to pretty-print
    types. I decided to just remove the latter and replace it with a
    reference to the former.
    [ci skip]
  • Fumiaki Kinoshita's avatar
    base: Add Generic instances to various datatypes under GHC.* · 55f0e783
    Fumiaki Kinoshita authored
    * GHC.Fingerprint.Types: Fingerprint
    * GHC.RTS.Flags: GiveGCStats, GCFlags, ConcFlags, DebugFlags, CCFlags, DoHeapProfile, ProfFlags, DoTrace, TraceFlags, TickyFlags, ParFlags and RTSFlags
    * GHC.Stats: RTSStats and GCStats
    * GHC.ByteOrder: ByteOrder
    * GHC.Unicode: GeneralCategory
    * GHC.Stack.Types: SrcLoc
    Metric Increase:
  • Gert-Jan Bottu's avatar
    Explicit Specificity · a9311cd5
    Gert-Jan Bottu authored
    Implementation for Ticket #16393.
    Explicit specificity allows users to manually create inferred type variables,
    by marking them with braces.
    This way, the user determines which variables can be instantiated through
    visible type application.
    The additional syntax is included in the parser, allowing users to write
    braces in type variable binders (type signatures, data constructors etc).
    This information is passed along through the renamer and verified in the
    type checker.
    The AST for type variable binders, data constructors, pattern synonyms,
    partial signatures and Template Haskell has been updated to include the
    specificity of type variables.
    Minor notes:
    - Bumps haddock submodule
    - Disables pattern match checking in GHC.Iface.Type with GHC 8.8
  • Ben Price's avatar
    Lint should say when it is checking a rule · 24e61aad
    Ben Price authored
    It is rather confusing that when lint finds an error in a rule attached
    to a binder, it reports the error as in the RHS, not the rule:
      In the RHS of foo
    We add a clarifying line:
      In the RHS of foo
      In a rule attached to foo
    The implication that the rule lives inside the RHS is a bit odd, but
    this niggle is already present for unfoldings, whose pattern we are
  • Ben Gamari's avatar
    nonmoving: Optimise the write barrier · 78c6523c
    Ben Gamari authored
  • Andreas Klebinger's avatar
    Refactor linear reg alloc to remember past assignments. · 13f6c9d0
    Andreas Klebinger authored
    When assigning registers we now first try registers we
    assigned to in the past, instead of picking the "first"
    This is in extremely helpful when dealing with loops for
    which variables are dead for part of the loop.
    This is important for patterns like this:
            foo = arg1
            foo = getVal()
            goto loop;
    There we:
    * assign foo to the register of arg1.
    * use foo, it's dead after this use as it's overwritten after.
    * do other things.
    * look for a register to put foo in.
    If we pick an arbitrary one it might differ from the register the
    start of the loop expect's foo to be in.
    To fix this we simply look for past register assignments for
    the given variable. If we find one and the register is free we
    use that register.
    This reduces the need for fixup blocks which match the register
    assignment between blocks. In the example above between the end
    and the head of the loop.
    This patch also moves branch weight estimation ahead of register
    allocation and adds a flag to control it (cmm-static-pred).
    * It means the linear allocator is more likely to assign the hotter
      code paths first.
    * If it assign these first we are:
      + Less likely to spill on the hot path.
      + Less likely to introduce fixup blocks on the hot path.
    These two measure combined are surprisingly effective. Based on nofib
    we get in the mean:
    * -0.9% instructions executed
    * -0.1% reads/writes
    * -0.2% code size.
    * -0.1% compiler allocations.
    * -0.9% compile time.
    * -0.8% runtime.
    Most of the benefits are simply a result of removing redundant moves
    and spills.
    Reduced compiler allocations likely are the result of less code being
    generated. (The added lookup is mostly non-allocating).
  • Andreas Klebinger's avatar
    NCG: Codelayout: Distinguish conditional and other branches. · edc2cc58
    Andreas Klebinger authored
    In #18053 we ended up with a suboptimal code layout because
    the code layout algorithm didn't distinguish between conditional
    and unconditional control flow.
    We can completely eliminate unconditional control flow instructions
    by placing blocks next to each other, not so much for conditionals.
    In terms of implementation we simply give conditional branches less
    weight before computing the layout.
    Fixes #18053
  • Gleb Popov's avatar
    gitlab-ci: Set locale to C.UTF-8. · b7a6b2f4
    Gleb Popov authored
  • Stefan Holdermans's avatar
    Allow spaces in GHCi :script file names · a8c27cf6
    Stefan Holdermans authored
    This patch updates the user interface of GHCi so that file names passed
    to the ':script' command may contain spaces escaped with a backslash.
    For example:
      :script foo\ bar.script
    The implementation uses a modified version of 'words' that does not
    break on escaped spaces.
    Fixes #18027.
  • Stefan Holdermans's avatar
    Add extra tests for GHCi :script syntax checks · 82663959
    Stefan Holdermans authored
    The syntax for GHCi's ":script" command allows for only a single file
    name to be passed as an argument. This patch adds a test for the cases
    in which a file name is missing or multiple file names are passed.
    Related to #T18027.
  • Stefan Holdermans's avatar
    Allow GHCi :script file names in double quotes · a0b79e1b
    Stefan Holdermans authored
    This patch updates the user interface of GHCi so that file names passed
    to the ':script' command can be wrapped in double quotes.
    For example:
      :script "foo bar.script"
    The implementation uses a modified version of 'words' that treats
    character sequences enclosed in double quotes as single words.
    Fixes #18027.
  • Stefan Holdermans's avatar
    Update documentation for GHCi :script · cf566330
    Stefan Holdermans authored
    This patch adds the fixes that allow for file names containing spaces to
    be passed to GHCi's ':script' command to the release notes for 8.12 and
    expands the user-guide documentation for ':script' by mentioning how
    such file names can be passed.
    Related to #18027.
  • Tuan Le's avatar
  • John Ericson's avatar
    Use `Checker` for `tc_pat` · 964d3ea2
    John Ericson authored
  • John Ericson's avatar
    Use `Checker` for `tc_lpat` and `tc_lpats` · b797aa42
    John Ericson authored
  • John Ericson's avatar
    More judiciously panic in `ts_pat` · 5108e84a
    John Ericson authored
  • John Ericson's avatar
  • John Ericson's avatar
    Tiny cleaup eta-reduce away a function argument · cb4231db
    John Ericson authored
    In GHC, not in the code being compiled!
  • John Ericson's avatar
  • Vilem-Benjamin Liepelt's avatar
    Fix spelling mistakes and typos · 3451584f
    Vilem-Benjamin Liepelt authored
  • Vilem-Benjamin Liepelt's avatar
    Add INLINABLE pragmas to Enum list producers · b552e531
    Vilem-Benjamin Liepelt authored
    The INLINABLE pragmas ensure that we export stable (unoptimised) unfoldings in
    the interface file so we can do list fusion at usage sites.
    Related tickets: #15185, #8763, #18178.
  • Vilem-Benjamin Liepelt's avatar
    Piggyback on Enum Word methods for Word64 · e7480063
    Vilem-Benjamin Liepelt authored
    If we are on a 64 bit platform, we can use the efficient Enum Word
    methods for the Enum Word64 instance.
  • Vilem-Benjamin Liepelt's avatar
  • Richard Eisenberg's avatar
    MR template should ask for key part · 2b363ebb
    Richard Eisenberg authored
  • Sebastian Graf's avatar
    Make `Int`'s `mod` and `rem` strict in their first arguments · a95bbd0b
    Sebastian Graf authored
    They used to be strict until 4d2ac2d4 (9 years ago).
    It's obviously better to be strict for performance reasons.
    It also blocks #18067.
    NoFib results:
            Program         Allocs    Instrs
            integer          -1.1%     +0.4%
       wheel-sieve2         +21.2%    +20.7%
                Min          -1.1%     -0.0%
                Max         +21.2%    +20.7%
     Geometric Mean          +0.2%     +0.2%
    The regression in `wheel-sieve2` is due to reboxing that likely will go
    away with the resolution of #18067. See !3282 for details.
    Fixes #18187.
  • Galen Huntington's avatar
  • Alexey Kuleshevich's avatar
    Fix wording in primops documentation to reflect the correct reasoning: · 1b508a9e
    Alexey Kuleshevich authored
    * Besides resizing functions, shrinking ones also mutate the
      size of a mutable array and because of those two `sizeofMutabeByteArray`
      and `sizeofSmallMutableArray` are now deprecated
    * Change reference in documentation to the newer functions `getSizeof*`
      instead of `sizeof*` for shrinking functions
    * Fix incorrect mention of "byte" instead of "small"
  • Andreas Klebinger's avatar
    Don't variable-length encode magic iface constant. · 4ca0c8a1
    Andreas Klebinger authored
    We changed to use variable length encodings for many types by default,
    including Word32. This makes sense for numbers but not when Word32 is
    meant to represent four bytes.
    I added a FixedLengthEncoding newtype to Binary who's instances
    interpret their argument as a collection of bytes instead of a number.
    We then use this when writing/reading magic numbers to the iface file.
    I also took the libery to remove the dummy iface field.
    This fixes #18180.
  • Krzysztof Gogolewski's avatar
    Add a regression test for #11506 · a1275081
    Krzysztof Gogolewski authored
    The testcase works now.
    See explanation in ghc/ghc#11506 (comment 273202)
  • Krzysztof Gogolewski's avatar
    Sort deterministically metric output · 8a816e5f
    Krzysztof Gogolewski authored
    Previously, we sorted according to the test name and way,
    but the metrics (max_bytes_used/peak_megabytes_allocated etc.)
    were appearing in nondeterministic order.
  • Sylvain Henry's avatar
    Move isDynLinkName into GHC.Types.Name · 566cc73f
    Sylvain Henry authored
    It doesn't belong into GHC.Unit.State
  • Ben Gamari's avatar
    CoreToStg: Add Outputable ArgInfo instance · e3777291
    Ben Gamari authored
  • Simon Peyton Jones's avatar
    Make Lint check return type of a join point · df285125
    Simon Peyton Jones authored
       join x = rhs in body
    It's important that the type of 'rhs' is the same as the type of
    'body', but Lint wasn't checking that invariant.
    Now it does!  This was exposed by investigation into !3113.
  • Simon Peyton Jones's avatar
    Do not float join points in exprIsConApp_maybe · 31f1c568
    Simon Peyton Jones authored
    We hvae been making exprIsConApp_maybe cleverer in recent times:
        commit b78cc64e
        Date:   Thu Nov 15 17:14:31 2018 +0100
        Make constructor wrappers inline only during the final phase
        commit 7833cf40
        Date:   Thu Jan 24 17:58:50 2019 +0100
        Look through newtype wrappers (Trac #16254)
        commit c25b135f
        Date:   Thu Feb 21 12:03:22 2019 +0000
        Fix exprIsConApp_maybe
    But alas there was still a bug, now immortalised in
      Note [Don't float join points]
    in SimpleOpt.
    It's quite hard to trigger because it requires a dead
    join point, but it came up when compiling Cabal
    Cabal.Distribution.Fields.Lexer.hs, when working on
    Happily, the fix is extremly easy.  Finding the
    bug was not so easy.
  • Ben Gamari's avatar
    Allow simplification through runRW# · 5a2cdd5c
    Ben Gamari authored
    Because runRW# inlines so late, we were previously able to do very
    little simplification across it. For instance, given even a simple
    program like
        case runRW# (\s -> let n = I# 42# in n) of
          I# n# -> f n#
    we previously had no way to avoid the allocation of the I#.
    This patch allows the simplifier to push strict contexts into the
    continuation of a runRW# application, as explained in
    in Note [Simplification of runRW#] in GHC.CoreToStg.Prep.
    Fixes #15127.
    Metric Increase:
    Metric Decrease:
    Co-Authored-By: Simon Peyton Jones's avatarSimon Peyton-Jone <simonpj@microsoft.com>
......@@ -2,7 +2,7 @@ variables:
# Commit of ghc/ci-images repository from which to pull Docker images
DOCKER_REV: 3f731f5d37a156e7ebe10cd32656946083baaf4a
DOCKER_REV: 6223fe0b5942f4fa35bdec92c74566cf195bfb42
# Sequential version number capturing the versions of all tools fetched by
# .gitlab/ci.sh.
......@@ -26,6 +26,9 @@ LT_CYAN="1;36"
export LANG=C.UTF-8
export LC_ALL=C.UTF-8
# GitLab Pipelines log section delimiters
# https://gitlab.com/gitlab-org/gitlab-foss/issues/14664
start_section() {
Thank you for your contribution to GHC!
**Please read the checklist below to make sure your contribution fulfills these
expectations. Also please answer the following question in your MR description:**
**Where is the key part of this patch? That is, what should reviewers look at first?**
Please take a few moments to verify that your commits fulfill the following:
* [ ] are either individually buildable or squashed
......@@ -10,7 +15,7 @@ Please take a few moments to verify that your commits fulfill the following:
likely should add a [Note][notes] and cross-reference it from the relevant
* [ ] add a [testcase to the testsuite](https://gitlab.haskell.org/ghc/ghc/wikis/building/running-tests/adding).
* [ ] if your MR affects library interfaces (e.g. changes `base`) please add
* [ ] if your MR affects library interfaces (e.g. changes `base`) or affects whether GHC will accept user-written code, please add
the ~"user facing" label.
* [ ] updates the users guide if applicable
* [ ] mentions new features in the release notes for the next release
......@@ -178,7 +178,7 @@ module GHC (
isPrimOpId, isFCallId, isClassOpId_maybe,
isDataConWorkId, idDataCon,
isBottomingId, isDictonaryId,
isDeadEndId, isDictonaryId,
-- ** Type constructors
......@@ -597,8 +597,7 @@ checkBrokenTablesNextToCode' dflags
setSessionDynFlags :: GhcMonad m => DynFlags -> m [UnitId]
setSessionDynFlags dflags = do
dflags' <- checkNewDynFlags dflags
dflags'' <- liftIO $ interpretPackageEnv dflags'
(dflags''', preload) <- liftIO $ initPackages dflags''
(dflags''', preload) <- liftIO $ initPackages dflags'
-- Interpreter
interp <- if gopt Opt_ExternalInterpreter dflags
......@@ -715,7 +714,11 @@ getInteractiveDynFlags = withSession $ \h -> return (ic_dflags (hsc_IC h))
parseDynamicFlags :: MonadIO m =>
DynFlags -> [Located String]
-> m (DynFlags, [Located String], [Warn])
parseDynamicFlags = parseDynamicFlagsCmdLine
parseDynamicFlags dflags cmdline = do
(dflags1, leftovers, warns) <- parseDynamicFlagsCmdLine dflags cmdline
dflags2 <- liftIO $ interpretPackageEnv dflags1
return (dflags2, leftovers, warns)
-- | Checks the set of new DynFlags for possibly erroneous option
-- combinations when invoking 'setSessionDynFlags' and friends, and if
......@@ -105,6 +105,9 @@ templateHaskellNames = [
numTyLitName, strTyLitName,
-- TyVarBndr
plainTVName, kindedTVName,
plainInvisTVName, kindedInvisTVName,
-- Specificity
specifiedSpecName, inferredSpecName,
-- Role
nominalRName, representationalRName, phantomRName, inferRName,
-- Kind
......@@ -152,7 +155,7 @@ templateHaskellNames = [
expQTyConName, fieldExpTyConName, predTyConName,
stmtTyConName, decsTyConName, conTyConName, bangTypeTyConName,
varBangTypeTyConName, typeQTyConName, expTyConName, decTyConName,
typeTyConName, tyVarBndrTyConName, clauseTyConName,
typeTyConName, tyVarBndrUnitTyConName, tyVarBndrSpecTyConName, clauseTyConName,
patQTyConName, funDepTyConName, decsQTyConName,
ruleBndrTyConName, tySynEqnTyConName,
roleTyConName, tExpTyConName, injAnnTyConName, kindTyConName,
......@@ -471,6 +474,15 @@ plainTVName, kindedTVName :: Name
plainTVName = libFun (fsLit "plainTV") plainTVIdKey
kindedTVName = libFun (fsLit "kindedTV") kindedTVIdKey
plainInvisTVName, kindedInvisTVName :: Name
plainInvisTVName = libFun (fsLit "plainInvisTV") plainInvisTVIdKey
kindedInvisTVName = libFun (fsLit "kindedInvisTV") kindedInvisTVIdKey
-- data Specificity = ...
specifiedSpecName, inferredSpecName :: Name
specifiedSpecName = libFun (fsLit "specifiedSpec") specifiedSpecKey
inferredSpecName = libFun (fsLit "inferredSpec") inferredSpecKey
-- data Role = ...
nominalRName, representationalRName, phantomRName, inferRName :: Name
nominalRName = libFun (fsLit "nominalR") nominalRIdKey
......@@ -546,7 +558,8 @@ patQTyConName, expQTyConName, stmtTyConName,
conTyConName, bangTypeTyConName,
varBangTypeTyConName, typeQTyConName,
decsQTyConName, ruleBndrTyConName, tySynEqnTyConName, roleTyConName,
derivClauseTyConName, kindTyConName, tyVarBndrTyConName,
derivClauseTyConName, kindTyConName,
tyVarBndrUnitTyConName, tyVarBndrSpecTyConName,
derivStrategyTyConName :: Name
-- These are only used for the types of top-level splices
expQTyConName = libTc (fsLit "ExpQ") expQTyConKey
......@@ -564,7 +577,8 @@ tySynEqnTyConName = thTc (fsLit "TySynEqn") tySynEqnTyConKey
roleTyConName = libTc (fsLit "Role") roleTyConKey
derivClauseTyConName = thTc (fsLit "DerivClause") derivClauseTyConKey
kindTyConName = thTc (fsLit "Kind") kindTyConKey
tyVarBndrTyConName = thTc (fsLit "TyVarBndr") tyVarBndrTyConKey
tyVarBndrUnitTyConName = libTc (fsLit "TyVarBndrUnit") tyVarBndrUnitTyConKey
tyVarBndrSpecTyConName = libTc (fsLit "TyVarBndrSpec") tyVarBndrSpecTyConKey
derivStrategyTyConName = thTc (fsLit "DerivStrategy") derivStrategyTyConKey
-- quasiquoting
......@@ -628,7 +642,8 @@ quoteClassKey = mkPreludeClassUnique 201
expTyConKey, matchTyConKey, clauseTyConKey, qTyConKey, expQTyConKey,
stmtTyConKey, conTyConKey, typeQTyConKey, typeTyConKey,
tyVarBndrTyConKey, decTyConKey, bangTypeTyConKey, varBangTypeTyConKey,
tyVarBndrUnitTyConKey, tyVarBndrSpecTyConKey,
decTyConKey, bangTypeTyConKey, varBangTypeTyConKey,
fieldExpTyConKey, fieldPatTyConKey, nameTyConKey, patQTyConKey,
funDepTyConKey, predTyConKey,
predQTyConKey, decsQTyConKey, ruleBndrTyConKey, tySynEqnTyConKey,
......@@ -655,7 +670,8 @@ patQTyConKey = mkPreludeTyConUnique 219
funDepTyConKey = mkPreludeTyConUnique 222
predTyConKey = mkPreludeTyConUnique 223
predQTyConKey = mkPreludeTyConUnique 224
tyVarBndrTyConKey = mkPreludeTyConUnique 225
tyVarBndrUnitTyConKey = mkPreludeTyConUnique 225
tyVarBndrSpecTyConKey = mkPreludeTyConUnique 237
decsQTyConKey = mkPreludeTyConUnique 226
ruleBndrTyConKey = mkPreludeTyConUnique 227
tySynEqnTyConKey = mkPreludeTyConUnique 228
......@@ -985,6 +1001,10 @@ plainTVIdKey, kindedTVIdKey :: Unique
plainTVIdKey = mkPreludeMiscIdUnique 413
kindedTVIdKey = mkPreludeMiscIdUnique 414
plainInvisTVIdKey, kindedInvisTVIdKey :: Unique
plainInvisTVIdKey = mkPreludeMiscIdUnique 482
kindedInvisTVIdKey = mkPreludeMiscIdUnique 483
-- data Role = ...
nominalRIdKey, representationalRIdKey, phantomRIdKey, inferRIdKey :: Unique
nominalRIdKey = mkPreludeMiscIdUnique 415
......@@ -1060,6 +1080,11 @@ anyclassStrategyIdKey = mkPreludeDataConUnique 495
newtypeStrategyIdKey = mkPreludeDataConUnique 496
viaStrategyIdKey = mkPreludeDataConUnique 497
-- data Specificity = ...
specifiedSpecKey, inferredSpecKey :: Unique
specifiedSpecKey = mkPreludeMiscIdUnique 498
inferredSpecKey = mkPreludeMiscIdUnique 499
* *
......@@ -586,7 +586,7 @@ pcDataConWithFixity' declared_infix dc_name wrk_key rri
(map (const no_bang) arg_tys)
[] -- No labelled fields
tyvars ex_tyvars
(mkTyCoVarBinders Specified user_tyvars)
(mkTyVarBinders SpecifiedSpec user_tyvars)
[] -- No equality spec
[] -- No theta
arg_tys (mkTyConApp tycon (mkTyVarTys tyvars))
......@@ -399,9 +399,23 @@ funTyConName = mkPrimTyConName (fsLit "->") funTyConKey funTyCon
-- | The @(->)@ type constructor.
-- @
-- (->) :: forall (rep1 :: RuntimeRep) (rep2 :: RuntimeRep).
-- TYPE rep1 -> TYPE rep2 -> *
-- (->) :: forall {rep1 :: RuntimeRep} {rep2 :: RuntimeRep}.
-- TYPE rep1 -> TYPE rep2 -> Type
-- @
-- The runtime representations quantification is left inferred. This
-- means they cannot be specified with @-XTypeApplications@.
-- This is a deliberate choice to allow future extensions to the
-- function arrow. To allow visible application a type synonym can be
-- defined:
-- @
-- type Arr :: forall (rep1 :: RuntimeRep) (rep2 :: RuntimeRep).
-- TYPE rep1 -> TYPE rep2 -> Type
-- type Arr = (->)
-- @
funTyCon :: TyCon
funTyCon = mkFunTyCon funTyConName tc_bndrs tc_rep_nm
......@@ -1254,7 +1254,7 @@ primop ShrinkSmallMutableArrayOp_Char "shrinkSmallMutableArray#" GenPrimOp
SmallMutableArray# s a -> Int# -> State# s -> State# s
{Shrink mutable array to new specified size, in
the specified state thread. The new size argument must be less than or
equal to the current size as reported by {\tt sizeofSmallMutableArray\#}.}
equal to the current size as reported by {\tt getSizeofSmallMutableArray\#}.}
with out_of_line = True
has_side_effects = True
......@@ -1279,8 +1279,8 @@ primop SizeofSmallArrayOp "sizeofSmallArray#" GenPrimOp
primop SizeofSmallMutableArrayOp "sizeofSmallMutableArray#" GenPrimOp
SmallMutableArray# s a -> Int#
{Return the number of elements in the array. Note that this is deprecated
as it is unsafe in the presence of resize operations on the
same byte array.}
as it is unsafe in the presence of shrink and resize operations on the
same small mutable array.}
with deprecated_msg = { Use 'getSizeofSmallMutableArray#' instead }
primop GetSizeofSmallMutableArrayOp "getSizeofSmallMutableArray#" GenPrimOp
......@@ -1451,7 +1451,7 @@ primop ShrinkMutableByteArrayOp_Char "shrinkMutableByteArray#" GenPrimOp
MutableByteArray# s -> Int# -> State# s -> State# s
{Shrink mutable byte array to new specified size (in bytes), in
the specified state thread. The new size argument must be less than or
equal to the current size as reported by {\tt sizeofMutableByteArray\#}.}
equal to the current size as reported by {\tt getSizeofMutableByteArray\#}.}
with out_of_line = True
has_side_effects = True
......@@ -1484,7 +1484,7 @@ primop SizeofByteArrayOp "sizeofByteArray#" GenPrimOp
primop SizeofMutableByteArrayOp "sizeofMutableByteArray#" GenPrimOp
MutableByteArray# s -> Int#
{Return the size of the array in bytes. Note that this is deprecated as it is
unsafe in the presence of resize operations on the same byte
unsafe in the presence of shrink and resize operations on the same mutable byte
with deprecated_msg = { Use 'getSizeofMutableByteArray#' instead }
......@@ -2567,14 +2567,17 @@ section "Exceptions"
-- Note [Strictness for mask/unmask/catch]
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-- Consider this example, which comes from GHC.IO.Handle.Internals:
-- wantReadableHandle3 f ma b st
-- = case ... of
-- DEFAULT -> case ma of MVar a -> ...
-- 0# -> maskAsynchExceptions# (\st -> case ma of MVar a -> ...)
-- 0# -> maskAsyncExceptions# (\st -> case ma of MVar a -> ...)
-- The outer case just decides whether to mask exceptions, but we don't want
-- thereby to hide the strictness in 'ma'! Hence the use of strictApply1Dmd.
-- thereby to hide the strictness in 'ma'! Hence the use of strictApply1Dmd
-- in mask and unmask. But catch really is lazy in its first argument, see
-- #11555. So for IO actions 'ma' we often use a wrapper around it that is
-- head-strict in 'ma': GHC.IO.catchException.
primop CatchOp "catch#" GenPrimOp
(State# RealWorld -> (# State# RealWorld, a #) )
......@@ -2593,13 +2596,16 @@ primop RaiseOp "raise#" GenPrimOp
b -> o
-- NB: the type variable "o" is "a", but with OpenKind
-- In contrast to 'raiseIO#', which throws a *precise* exception,
-- exceptions thrown by 'raise#' are considered *imprecise*.
-- See Note [Precise vs imprecise exceptions] in GHC.Types.Demand.
-- Hence, it has 'botDiv', not 'exnDiv'.
-- For the same reasons, 'raise#' is marked as "can_fail" (which 'raiseIO#'
-- is not), but not as "has_side_effects" (which 'raiseIO#' is).
-- See Note [PrimOp can_fail and has_side_effects] in PrimOp.hs.
strictness = { \ _arity -> mkClosedStrictSig [topDmd] botDiv }
out_of_line = True
has_side_effects = True
-- raise# certainly throws a Haskell exception and hence has_side_effects
-- It doesn't actually make much difference because the fact that it
-- returns bottom independently ensures that we are careful not to discard
-- it. But still, it's better to say the Right Thing.
can_fail = True
-- Note [Arithmetic exception primops]
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......@@ -2648,8 +2654,8 @@ primop RaiseIOOp "raiseIO#" GenPrimOp
a -> State# RealWorld -> (# State# RealWorld, b #)
-- See Note [Precise exceptions and strictness analysis] in Demand.hs
-- for why we give it topDiv
-- strictness = { \ _arity -> mkClosedStrictSig [topDmd, topDmd] topDiv }
-- for why this is the *only* primop that has 'exnDiv'
strictness = { \ _arity -> mkClosedStrictSig [topDmd, topDmd] exnDiv }
out_of_line = True
has_side_effects = True
......@@ -12,7 +12,7 @@ module GHC.Cmm (
CmmBlock, RawCmmDecl,
Section(..), SectionType(..),
GenCmmStatics(..), type CmmStatics, type RawCmmStatics, CmmStatic(..),
SectionProtection(..), sectionProtection,
-- ** Blocks containing lists
GenBasicBlock(..), blockId,
......@@ -185,17 +185,33 @@ data SectionType
| OtherSection String
deriving (Show)
-- | Should a data in this section be considered constant
isSecConstant :: Section -> Bool
isSecConstant (Section t _) = case t of
Text -> True
ReadOnlyData -> True
RelocatableReadOnlyData -> True
ReadOnlyData16 -> True
CString -> True
Data -> False
UninitialisedData -> False
(OtherSection _) -> False
data SectionProtection
= ReadWriteSection
| ReadOnlySection
| WriteProtectedSection -- See Note [Relocatable Read-Only Data]
deriving (Eq)
-- | Should a data in this section be considered constant at runtime
sectionProtection :: Section -> SectionProtection
sectionProtection (Section t _) = case t of
Text -> ReadOnlySection
ReadOnlyData -> ReadOnlySection
RelocatableReadOnlyData -> WriteProtectedSection
ReadOnlyData16 -> ReadOnlySection
CString -> ReadOnlySection
Data -> ReadWriteSection
UninitialisedData -> ReadWriteSection
(OtherSection _) -> ReadWriteSection
Note [Relocatable Read-Only Data]
Relocatable data are only read-only after relocation at the start of the
program. They should be writable from the source code until then. Failure to
do so would end up in segfaults at execution when using linkers that do not
enforce writability of those sections, such as the gold linker.
data Section = Section SectionType CLabel
......@@ -119,7 +119,6 @@ import GHC.Prelude
import GHC.Types.Id.Info
import GHC.Types.Basic
import {-# SOURCE #-} GHC.Cmm.BlockId (BlockId, mkBlockId)
import GHC.Unit.State
import GHC.Unit
import GHC.Types.Name
import GHC.Types.Unique
......@@ -459,7 +459,7 @@ type CAFSet = Set CAFLabel
type CAFEnv = LabelMap CAFSet
mkCAFLabel :: CLabel -> CAFLabel
mkCAFLabel lbl = CAFLabel $! toClosureLbl lbl
mkCAFLabel lbl = CAFLabel (toClosureLbl lbl)
-- This is a label that we can put in an SRT. It *must* be a closure label,
-- pointing to either a FUN_STATIC, THUNK_STATIC, or CONSTR.
......@@ -736,10 +736,11 @@ getStaticFuns decls =
type SRTMap = Map CAFLabel (Maybe SRTEntry)
-- | Given SRTMap of a module returns the set of non-CAFFY names in the module.
-- Any Names not in the set are CAFFY.
srtMapNonCAFs :: SRTMap -> NameSet
srtMapNonCAFs srtMap = mkNameSet (mapMaybe get_name (Map.toList srtMap))
-- | Given 'SRTMap' of a module, returns the set of non-CAFFY names in the
-- module. Any 'Name's not in the set are CAFFY.
srtMapNonCAFs :: SRTMap -> NonCaffySet
srtMapNonCAFs srtMap =
NonCaffySet $ mkNameSet (mapMaybe get_name (Map.toList srtMap))
get_name (CAFLabel l, Nothing) = hasHaskellName l
get_name (_l, Just _srt_entry) = Nothing
......@@ -69,6 +69,7 @@ cmmMachOpFoldM _ op [CmmLit (CmmInt x rep)]
MO_SF_Conv _from to -> CmmLit (CmmFloat (fromInteger x) to)
MO_SS_Conv from to -> CmmLit (CmmInt (narrowS from x) to)
MO_UU_Conv from to -> CmmLit (CmmInt (narrowU from x) to)
MO_XX_Conv from to -> CmmLit (CmmInt (narrowS from x) to)
_ -> panic $ "cmmMachOpFoldM: unknown unary op: " ++ show op
......@@ -76,6 +77,7 @@ cmmMachOpFoldM _ op [CmmLit (CmmInt x rep)]
-- Eliminate conversion NOPs
cmmMachOpFoldM _ (MO_SS_Conv rep1 rep2) [x] | rep1 == rep2 = Just x
cmmMachOpFoldM _ (MO_UU_Conv rep1 rep2) [x] | rep1 == rep2 = Just x
cmmMachOpFoldM _ (MO_XX_Conv rep1 rep2) [x] | rep1 == rep2 = Just x
-- Eliminate nested conversions where possible
cmmMachOpFoldM platform conv_outer [CmmMachOp conv_inner [x]]
......@@ -728,7 +728,7 @@ cmmNativeGen dflags this_mod modLoc ncgImpl us fileIds dbgMap cmm count
let optimizedCFG :: Maybe CFG
optimizedCFG =
optimizeCFG (cfgWeightInfo dflags) cmm <$!> postShortCFG
optimizeCFG (gopt Opt_CmmStaticPred dflags) (cfgWeightInfo dflags) cmm <$!> postShortCFG
maybeDumpCfg dflags optimizedCFG "CFG Weights - Final" proc_name
......@@ -240,7 +240,44 @@ import Control.Monad (foldM)
Assuming that Lwork is large the chance that the "call" ends up
in the same cache line is also fairly small.
~~~ Note [Layout relevant edge weights]
The input to the chain based code layout algorithm is a CFG
with edges annotated with their frequency. The frequency
of traversal corresponds quite well to the cost of not placing
the connected blocks next to each other.
However even if having the same frequency certain edges are
inherently more or less relevant to code layout.
In particular:
* Edges which cross an info table are less relevant than others.
If we place the blocks across this edge next to each other
they are still separated by the info table which negates
much of the benefit. It makes it less likely both blocks
will share a cache line reducing the benefits from locality.
But it also prevents us from eliminating jump instructions.
* Conditional branches and switches are slightly less relevant.
We can completely remove unconditional jumps by placing them
next to each other. This is not true for conditional branch edges.
We apply a small modifier to them to ensure edges for which we can
eliminate the overhead completely are considered first. See also #18053.
* Edges constituted by a call are ignored.
Considering these hardly helped with performance and ignoring
them helps quite a bit to improve compiler performance.
So we perform a preprocessing step where we apply a multiplicator
to these kinds of edges.
-- | Look at X number of blocks in two chains to determine
......@@ -636,35 +673,35 @@ sequenceChain :: forall a i. (Instruction i, Outputable i)
-> [GenBasicBlock i] -- ^ Blocks placed in sequence.
sequenceChain _info _weights [] = []
sequenceChain _info _weights [x] = [x]
sequenceChain info weights' blocks@((BasicBlock entry _):_) =
let weights :: CFG
weights = --pprTrace "cfg'" (pprEdgeWeights cfg')
(_, globalEdgeWeights) = {-# SCC mkGlobalWeights #-} mkGlobalWeights entry weights'
cfg' = {-# SCC rewriteEdges #-}
(\cfg from m ->
(\cfg to w -> setEdgeWeight cfg (EdgeWeight w) from to )
cfg m )
directEdges :: [CfgEdge]
sequenceChain info weights blocks@((BasicBlock entry _):_) =
let directEdges :: [CfgEdge]
directEdges = sortBy (flip compare) $ catMaybes . map relevantWeight $ (infoEdgeList weights)
-- Apply modifiers to turn edge frequencies into useable weights
-- for computing code layout.
-- See also Note [Layout relevant edge weights]
relevantWeight :: CfgEdge -> Maybe CfgEdge
relevantWeight edge@(CfgEdge from to edgeInfo)
| (EdgeInfo CmmSource { trans_cmmNode = CmmCall {} } _) <- edgeInfo
-- Ignore edges across calls
-- Ignore edges across calls.
= Nothing
| mapMember to info
, w <- edgeWeight edgeInfo
-- The payoff is small if we jump over an info table
-- The payoff is quite small if we jump over an info table
= Just (CfgEdge from to edgeInfo { edgeWeight = w/8 })
| (EdgeInfo CmmSource { trans_cmmNode = exitNode } _) <- edgeInfo
, cantEliminate exitNode
, w <- edgeWeight edgeInfo
-- A small penalty to edge types which
-- we can't optimize away by layout.
-- w * 0.96875 == w - w/32
= Just (CfgEdge from to edgeInfo { edgeWeight = w * 0.96875 })
| otherwise
= Just edge
cantEliminate CmmCondBranch {} = True
cantEliminate CmmSwitch {} = True
cantEliminate _ = False
blockMap :: LabelMap (GenBasicBlock i)
......@@ -670,11 +670,21 @@ findBackEdges root cfg =
typedEdges =
classifyEdges root getSuccs edges :: [((BlockId,BlockId),EdgeType)]
optimizeCFG :: D.CfgWeights -> RawCmmDecl -> CFG -> CFG
optimizeCFG _ (CmmData {}) cfg = cfg
optimizeCFG weights (CmmProc info _lab _live graph) cfg =
{-# SCC optimizeCFG #-}
optimizeCFG :: Bool -> D.CfgWeights -> RawCmmDecl -> CFG -> CFG
optimizeCFG _ _ (CmmData {}) cfg = cfg
optimizeCFG doStaticPred weights proc@(CmmProc _info _lab _live graph) cfg =
(if doStaticPred then staticPredCfg (g_entry graph) else id) $
optHsPatterns weights proc $ cfg
-- | Modify branch weights based on educated guess on
-- patterns GHC tends to produce and how they affect
-- performance.
-- Most importantly we penalize jumps across info tables.
optHsPatterns :: D.CfgWeights -> RawCmmDecl -> CFG -> CFG
optHsPatterns _ (CmmData {}) cfg = cfg
optHsPatterns weights (CmmProc info _lab _live graph) cfg =
{-# SCC optHsPatterns #-}
-- pprTrace "Initial:" (pprEdgeWeights cfg) $
-- pprTrace "Initial:" (ppr $ mkGlobalWeights (g_entry graph) cfg) $
......@@ -749,6 +759,21 @@ optimizeCFG weights (CmmProc info _lab _live graph) cfg =
| CmmSource { trans_cmmNode = CmmCondBranch {} } <- source = True
| otherwise = False
-- | Convert block-local branch weights to global weights.
staticPredCfg :: BlockId -> CFG -> CFG
staticPredCfg entry cfg = cfg'
(_, globalEdgeWeights) = {-# SCC mkGlobalWeights #-}
mkGlobalWeights entry cfg
cfg' = {-# SCC rewriteEdges #-}
(\cfg from m ->
(\cfg to w -> setEdgeWeight cfg (EdgeWeight w) from to )
cfg m )
-- | Determine loop membership of blocks based on SCC analysis
-- This is faster but only gives yes/no answers.
loopMembers :: HasDebugCallStack => CFG -> LabelMap Bool
......@@ -922,6 +947,10 @@ revPostorderFrom cfg root =
-- reverse post order. Which is required for diamond control flow to work probably.
-- We also apply a few prediction heuristics (based on the same paper)
-- The returned result represents frequences.
-- For blocks it's the expected number of executions and
-- for edges is the number of traversals.
{-# NOINLINE mkGlobalWeights #-}
{-# SCC mkGlobalWeights #-}
......@@ -37,7 +37,10 @@ import GHC.CmmToAsm.Config
-- (for allocation purposes, anyway).
data RegUsage
= RU [Reg] [Reg]
= RU {
reads :: [Reg],
writes :: [Reg]
-- | No regs read or written to.
noUsage :: RegUsage
......@@ -554,8 +554,9 @@ delAssoc :: (Uniquable a)
delAssoc a m
| Just aSet <- lookupUFM m a
, m1 <- delFromUFM m a
= nonDetFoldUniqSet (\x m -> delAssoc1 x a m) m1 aSet
-- It's OK to use nonDetFoldUFM here because deletion is commutative
= nonDetStrictFoldUniqSet (\x m -> delAssoc1 x a m) m1 aSet
-- It's OK to use a non-deterministic fold here because deletion is
-- commutative
| otherwise = m
{-# LANGUAGE BangPatterns, CPP, ScopedTypeVariables #-}
{-# LANGUAGE ConstraintKinds #-}
{-# OPTIONS_GHC -Wno-incomplete-uni-patterns #-}
......@@ -137,6 +138,7 @@ import GHC.Platform
import Data.Maybe
import Data.List
import Control.Monad
import Control.Applicative
-- -----------------------------------------------------------------------------
-- Top level of the register allocator
......@@ -229,8 +231,13 @@ linearRegAlloc config entry_ids block_live sccs
go f = linearRegAlloc' config f entry_ids block_live sccs
platform = ncgPlatform config
-- | Constraints on the instruction instances used by the
-- linear allocator.
type OutputableRegConstraint freeRegs instr =
(FR freeRegs, Outputable freeRegs, Outputable instr, Instruction instr)
:: (FR freeRegs, Outputable instr, Instruction instr)
:: OutputableRegConstraint freeRegs instr
=> NCGConfig
-> freeRegs
-> [BlockId] -- ^ entry points
......@@ -246,7 +253,7 @@ linearRegAlloc' config initFreeRegs entry_ids block_live sccs
return (blocks, stats, getStackUse stack)
linearRA_SCCs :: (FR freeRegs, Instruction instr, Outputable instr)
linearRA_SCCs :: OutputableRegConstraint freeRegs instr
=> [BlockId]
-> BlockMap RegSet
-> [NatBasicBlock instr]
......@@ -281,7 +288,7 @@ linearRA_SCCs entry_ids block_live blocksAcc (CyclicSCC blocks : sccs)
more sanity checking to guard against this eventuality.
process :: (FR freeRegs, Instruction instr, Outputable instr)
process :: OutputableRegConstraint freeRegs instr
=> [BlockId]
-> BlockMap RegSet
-> [GenBasicBlock (LiveInstr instr)]
......@@ -325,15 +332,18 @@ process entry_ids block_live (b@(BasicBlock id _) : blocks)
-- | Do register allocation on this basic block
:: (FR freeRegs, Outputable instr, Instruction instr)
:: OutputableRegConstraint freeRegs instr
=> BlockMap RegSet -- ^ live regs on entry to each basic block
-> LiveBasicBlock instr -- ^ block to do register allocation on
-> RegM freeRegs [NatBasicBlock instr] -- ^ block with registers allocated
processBlock block_live (BasicBlock id instrs)
= do initBlock id block_live
= do -- pprTraceM "processBlock" $ text "" $$ ppr (BasicBlock id instrs)
initBlock id block_live
(instrs', fixups)
<- linearRA block_live [] [] id instrs
-- pprTraceM "blockResult" $ ppr (instrs', fixups)
return $ BasicBlock id instrs' : fixups
......@@ -369,7 +379,7 @@ initBlock id block_live
-- | Do allocation for a sequence of instructions.
:: (FR freeRegs, Outputable instr, Instruction instr)
:: OutputableRegConstraint freeRegs instr
=> BlockMap RegSet -- ^ map of what vregs are live on entry to each block.
-> [instr] -- ^ accumulator for instructions already processed.
-> [NatBasicBlock instr] -- ^ accumulator for blocks of fixup code.
......@@ -396,7 +406,7 @@ linearRA block_live accInstr accFixups id (instr:instrs)
-- | Do allocation for a single instruction.
:: (FR freeRegs, Outputable instr, Instruction instr)
:: OutputableRegConstraint freeRegs instr
=> BlockMap RegSet -- ^ map of what vregs are love on entry to each block.
-> [instr] -- ^ accumulator for instructions already processed.
-> BlockId -- ^ the id of the current block, for debugging
......@@ -476,7 +486,7 @@ isInReg src assig | Just (InReg _) <- lookupUFM assig src = True
| otherwise = False
genRaInsn :: (FR freeRegs, Instruction instr, Outputable instr)
genRaInsn :: OutputableRegConstraint freeRegs instr
=> BlockMap RegSet
-> [instr]
-> BlockId
......@@ -486,6 +496,7 @@ genRaInsn :: (FR freeRegs, Instruction instr, Outputable instr)
-> RegM freeRegs ([instr], [NatBasicBlock instr])
genRaInsn block_live new_instrs block_id instr r_dying w_dying = do
-- pprTraceM "genRaInsn" $ ppr (block_id, instr)
platform <- getPlatform
case regUsageOfInstr platform instr of { RU read written ->
......@@ -525,6 +536,8 @@ genRaInsn block_live new_instrs block_id instr r_dying w_dying = do
(fixup_blocks, adjusted_instr)
<- joinToTargets block_live block_id instr
-- when (not $ null fixup_blocks) $ pprTraceM "genRA:FixBlocks" $ ppr fixup_blocks
-- Debugging - show places where the reg alloc inserted
-- assignment fixup blocks.
-- when (not $ null fixup_blocks) $
......@@ -737,7 +750,7 @@ data SpillLoc = ReadMem StackSlot -- reading from register only in memory
-- the list of free registers and free stack slots.
:: (FR freeRegs, Outputable instr, Instruction instr)
:: forall freeRegs instr. (FR freeRegs, Outputable instr, Instruction instr)
=> Bool -- True <=> reading (load up spilled regs)
-> [VirtualReg] -- don't push these out
-> [instr] -- spill insns
......@@ -749,7 +762,8 @@ allocateRegsAndSpill _ _ spills alloc []
= return (spills, reverse alloc)
allocateRegsAndSpill reading keep spills alloc (r:rs)
= do assig <- getAssigR
= do assig <- getAssigR :: RegM freeRegs (RegMap Loc)
-- pprTraceM "allocateRegsAndSpill:assig" (ppr (r:rs) $$ ppr assig)
let doSpill = allocRegsAndSpill_spill reading keep spills alloc r rs assig
case lookupUFM assig r of
-- case (1a): already in a register
......@@ -779,6 +793,26 @@ allocateRegsAndSpill reading keep spills alloc (r:rs)
| otherwise -> doSpill WriteNew
-- | Given a virtual reg find a preferred real register.
-- The preferred register is simply the first one the variable
-- was assigned to (if any). This way when we allocate for a loop
-- variables are likely to end up in the same registers at the
-- end and start of the loop, avoiding redundant reg-reg moves.
-- Note: I tried returning a list of past assignments, but that
-- turned out to barely matter but added a few tenths of
-- a percent to compile time.
findPrefRealReg :: forall freeRegs u. Uniquable u
=> u -> RegM freeRegs (Maybe RealReg)
findPrefRealReg vreg = do
bassig <- getBlockAssigR :: RegM freeRegs (BlockMap (freeRegs,RegMap Loc))
return $ foldr (findVirtRegAssig) Nothing bassig
findVirtRegAssig :: (freeRegs,RegMap Loc) -> Maybe RealReg -> Maybe RealReg
findVirtRegAssig assig z =
z <|> case lookupUFM (snd assig) vreg of
Just (InReg real_reg) -> Just real_reg
Just (InBoth real_reg _) -> Just real_reg
_ -> z
-- reading is redundant with reason, but we keep it around because it's
-- convenient and it maintains the recursive structure of the allocator. -- EZY
......@@ -795,18 +829,26 @@ allocRegsAndSpill_spill :: (FR freeRegs, Instruction instr, Outputable instr)
allocRegsAndSpill_spill reading keep spills alloc r rs assig spill_loc
= do platform <- getPlatform
freeRegs <- getFreeRegsR
let freeRegs_thisClass = frGetFreeRegs platform (classOfVirtualReg r) freeRegs
let freeRegs_thisClass = frGetFreeRegs platform (classOfVirtualReg r) freeRegs :: [RealReg]
case freeRegs_thisClass of
-- Can we put the variable into a register it already was?
pref_reg <- findPrefRealReg r
case freeRegs_thisClass of
-- case (2): we have a free register
(my_reg : _) ->
do spills' <- loadTemp r spill_loc my_reg spills
(first_free : _) ->
do let final_reg
| Just reg <- pref_reg
, reg `elem` freeRegs_thisClass
= reg
| otherwise
= first_free
spills' <- loadTemp r spill_loc final_reg spills
setAssigR (addToUFM assig r $! newLocation spill_loc my_reg)
setFreeRegsR $ frAllocateReg platform my_reg freeRegs
setAssigR (addToUFM assig r $! newLocation spill_loc final_reg)
setFreeRegsR $ frAllocateReg platform final_reg freeRegs
allocateRegsAndSpill reading keep spills' (my_reg : alloc) rs
allocateRegsAndSpill reading keep spills' (final_reg : alloc) rs
-- case (3): we need to push something out to free up a register
......@@ -814,7 +856,8 @@ allocRegsAndSpill_spill reading keep spills alloc r rs assig spill_loc
do let inRegOrBoth (InReg _) = True
inRegOrBoth (InBoth _ _) = True
inRegOrBoth _ = False
let candidates' =
let candidates' :: UniqFM Loc
candidates' =
flip delListFromUFM keep $
filterUFM inRegOrBoth $
......@@ -30,6 +30,7 @@ import GHC.Types.Unique.FM
import GHC.Types.Unique.Supply
import GHC.Cmm.BlockId
data ReadingOrWriting = Reading | Writing deriving (Eq,Ord)
-- | Used to store the register assignment on entry to a basic block.
-- We use this to handle join points, where multiple branch instructions
......@@ -138,6 +139,8 @@ data RA_State freeRegs
, ra_config :: !NCGConfig
-- | (from,fixup,to) : We inserted fixup code between from and to
, ra_fixups :: [(BlockId,BlockId,BlockId)] }
, ra_fixups :: [(BlockId,BlockId,BlockId)]
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
-- | Free regs map for PowerPC
module GHC.CmmToAsm.Reg.Linear.PPC where
......@@ -27,6 +29,9 @@ import Data.Bits
data FreeRegs = FreeRegs !Word32 !Word32
deriving( Show ) -- The Show is used in an ASSERT
instance Outputable FreeRegs where
ppr = text . show
noFreeRegs :: FreeRegs
noFreeRegs = FreeRegs 0 0
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
-- | Free regs map for SPARC
module GHC.CmmToAsm.Reg.Linear.SPARC where
......@@ -38,6 +39,9 @@ data FreeRegs
instance Show FreeRegs where
show = showFreeRegs
instance Outputable FreeRegs where
ppr = text . showFreeRegs
-- | A reg map where no regs are free to be allocated.
noFreeRegs :: FreeRegs
noFreeRegs = FreeRegs 0 0 0
{-# LANGUAGE CPP, PatternSynonyms, DeriveFunctor #-}
{-# LANGUAGE ScopedTypeVariables #-}
#if !defined(GHC_LOADED_INTO_GHCI)
{-# LANGUAGE UnboxedTuples #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
-- | Free regs map for i386
module GHC.CmmToAsm.Reg.Linear.X86 where
......@@ -9,12 +10,13 @@ import GHC.Platform.Reg.Class
import GHC.Platform.Reg
import GHC.Utils.Panic
import GHC.Platform
import GHC.Utils.Outputable
import Data.Word
import Data.Bits
newtype FreeRegs = FreeRegs Word32
deriving Show
deriving (Show,Outputable)
noFreeRegs :: FreeRegs
noFreeRegs = FreeRegs 0
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
-- | Free regs map for x86_64
module GHC.CmmToAsm.Reg.Linear.X86_64 where
......@@ -9,12 +10,13 @@ import GHC.Platform.Reg.Class
import GHC.Platform.Reg
import GHC.Utils.Panic
import GHC.Platform
import GHC.Utils.Outputable
import Data.Word
import Data.Bits
newtype FreeRegs = FreeRegs Word64
deriving Show
deriving (Show,Outputable)
noFreeRegs :: FreeRegs
noFreeRegs = FreeRegs 0
......@@ -129,6 +129,10 @@ pprTop dflags = \case
pprDataExterns platform lits $$
pprWordArray dflags (isSecConstant section) lbl lits
isSecConstant section = case sectionProtection section of
ReadOnlySection -> True
WriteProtectedSection -> True
_ -> False
platform = targetPlatform dflags
-- --------------------------------------------------------------------------
......@@ -83,7 +83,8 @@ genLlvmData (sec, CmmStaticsRaw lbl xs) = do
Section CString _ -> if (platformArch platform == ArchS390X)
then Just 2 else Just 1
_ -> Nothing
const = if isSecConstant sec then Constant else Global
const = if sectionProtection sec == ReadOnlySection
then Constant else Global
varDef = LMGlobalVar label tyAlias link lmsec align const
globDef = LMGlobal varDef struct
......@@ -759,8 +759,8 @@ arityType _ (Var v)
, not $ isTopSig strict_sig
, (ds, res) <- splitStrictSig strict_sig
, let arity = length ds
= if isBotDiv res then ABot arity
else ATop (take arity one_shots)
= if isDeadEndDiv res then ABot arity
else ATop (take arity one_shots)
| otherwise
= ATop (take (idArity v) one_shots)
......@@ -787,7 +787,7 @@ arityType env (App fun arg )
-- The difference is observable using 'seq'
arityType env (Case scrut _ _ alts)
| exprIsBottom scrut || null alts
| exprIsDeadEnd scrut || null alts
= ABot 0 -- Do not eta expand
-- See Note [Dealing with bottom (1)]
| otherwise
......@@ -292,7 +292,9 @@ tidyCoAxBndrsForUser init_env tcvs
Note [Function coercions]
Remember that
(->) :: forall r1 r2. TYPE r1 -> TYPE r2 -> TYPE LiftedRep
(->) :: forall {r1} {r2}. TYPE r1 -> TYPE r2 -> TYPE LiftedRep
whose `RuntimeRep' arguments are intentionally marked inferred to
avoid type application.
FunCo r co1 co2 :: (s1->t1) ~r (s2->t2)
......@@ -2282,7 +2284,7 @@ coercionRKind co
go_forall subst (ForAllCo tv1 k_co co)
-- See Note [Nested ForAllCos]
| isTyVar tv1
= mkInvForAllTy tv2 (go_forall subst' co)
= mkInfForAllTy tv2 (go_forall subst' co)
k2 = coercionRKind k_co
tv2 = setTyVarKind tv1 (substTy subst k2)
......@@ -119,7 +119,7 @@ conLikeInstOrigArgTys (PatSynCon pat_syn) tys =
-- followed by the existentially quantified type variables. For data
-- constructors, the situation is slightly more complicated—see
-- @Note [DataCon user type variable binders]@ in "GHC.Core.DataCon".
conLikeUserTyVarBinders :: ConLike -> [TyVarBinder]
conLikeUserTyVarBinders :: ConLike -> [InvisTVBinder]
conLikeUserTyVarBinders (RealDataCon data_con) =
dataConUserTyVarBinders data_con
conLikeUserTyVarBinders (PatSynCon pat_syn) =
......@@ -371,7 +371,7 @@ data DataCon
-- of tyvars (*not* covars) of dcExTyCoVars unioned with the
-- set of dcUnivTyVars whose tyvars do not appear in dcEqSpec
-- See Note [DataCon user type variable binders]
dcUserTyVarBinders :: [TyVarBinder],
dcUserTyVarBinders :: [InvisTVBinder],
dcEqSpec :: [EqSpec], -- Equalities derived from the result type,
-- _as written by the programmer_.
......@@ -939,10 +939,10 @@ mkDataCon :: Name
-- if it is a record, otherwise empty
-> [TyVar] -- ^ Universals.
-> [TyCoVar] -- ^ Existentials.