Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • ghc/ghc
  • bgamari/ghc
  • syd/ghc
  • ggreif/ghc
  • watashi/ghc
  • RolandSenn/ghc
  • mpickering/ghc
  • DavidEichmann/ghc
  • carter/ghc
  • harpocrates/ghc
  • ethercrow/ghc
  • mijicd/ghc
  • adamse/ghc
  • alexbiehl/ghc
  • gridaphobe/ghc
  • trofi/ghc
  • supersven/ghc
  • ppk/ghc
  • ulysses4ever/ghc
  • AndreasK/ghc
  • ghuntley/ghc
  • shayne-fletcher-da/ghc
  • fgaz/ghc
  • yav/ghc
  • osa1/ghc
  • mbbx6spp/ghc
  • JulianLeviston/ghc
  • reactormonk/ghc
  • rae/ghc
  • takenobu-hs/ghc
  • michalt/ghc
  • andrewthad/ghc
  • hsyl20/ghc
  • scottgw/ghc
  • sjakobi/ghc
  • angerman/ghc
  • RyanGlScott/ghc
  • hvr/ghc
  • howtonotwin/ghc
  • chessai/ghc
  • m-renaud/ghc
  • brprice/ghc
  • stevehartdata/ghc
  • sighingnow/ghc
  • kgardas/ghc
  • ckoparkar/ghc
  • alp/ghc
  • smaeul/ghc
  • kakkun61/ghc
  • sykloid/ghc
  • newhoggy/ghc
  • toonn/ghc
  • nineonine/ghc
  • Phyx/ghc
  • ezyang/ghc
  • tweag/ghc
  • langston/ghc
  • ndmitchell/ghc
  • rockbmb/ghc
  • artempyanykh/ghc
  • mniip/ghc
  • mynguyenbmc/ghc
  • alexfmpe/ghc
  • crockeea/ghc
  • nh2/ghc
  • vaibhavsagar/ghc
  • phadej/ghc
  • Haskell-mouse/ghc
  • lolotp/ghc
  • spacekitteh/ghc
  • michaelpj/ghc
  • mgsloan/ghc
  • HPCohen/ghc
  • tmobile/ghc
  • radrow/ghc
  • simonmar/ghc
  • _deepfire/ghc
  • Ericson2314/ghc
  • leitao/ghc
  • fumieval/ghc
  • trac-isovector/ghc
  • cblp/ghc
  • xich/ghc
  • ciil/ghc
  • erthalion/ghc
  • xldenis/ghc
  • autotaker/ghc
  • haskell-wasm/ghc
  • kcsongor/ghc
  • agander/ghc
  • Baranowski/ghc
  • trac-dredozubov/ghc
  • 23Skidoo/ghc
  • iustin/ghc
  • ningning/ghc
  • josefs/ghc
  • kabuhr/ghc
  • gallais/ghc
  • dten/ghc
  • expipiplus1/ghc
  • Pluralia/ghc
  • rohanjr/ghc
  • intricate/ghc
  • kirelagin/ghc
  • Javran/ghc
  • DanielG/ghc
  • trac-mizunashi_mana/ghc
  • pparkkin/ghc
  • bollu/ghc
  • ntc2/ghc
  • jaspervdj/ghc
  • JoshMeredith/ghc
  • wz1000/ghc
  • zkourouma/ghc
  • code5hot/ghc
  • jdprice/ghc
  • tdammers/ghc
  • J-mie6/ghc
  • trac-lantti/ghc
  • ch1bo/ghc
  • cgohla/ghc
  • lucamolteni/ghc
  • acairncross/ghc
  • amerocu/ghc
  • chreekat/ghc
  • txsmith/ghc
  • trupill/ghc
  • typetetris/ghc
  • sergv/ghc
  • fryguybob/ghc
  • erikd/ghc
  • trac-roland/ghc
  • setupminimal/ghc
  • Friede80/ghc
  • SkyWriter/ghc
  • xplorld/ghc
  • abrar/ghc
  • obsidiansystems/ghc
  • Icelandjack/ghc
  • adinapoli/ghc
  • trac-matthewbauer/ghc
  • heatsink/ghc
  • dwijnand/ghc
  • Cmdv/ghc
  • alinab/ghc
  • pepeiborra/ghc
  • fommil/ghc
  • luochen1990/ghc
  • rlupton20/ghc
  • applePrincess/ghc
  • lehins/ghc
  • ronmrdechai/ghc
  • leeadam/ghc
  • harendra/ghc
  • mightymosquito1991/ghc
  • trac-gershomb/ghc
  • lucajulian/ghc
  • Rizary/ghc
  • VictorCMiraldo/ghc
  • jamesbrock/ghc
  • andrewdmeier/ghc
  • luke/ghc
  • pranaysashank/ghc
  • cocreature/ghc
  • hithroc/ghc
  • obreitwi/ghc
  • slrtbtfs/ghc
  • kaol/ghc
  • yairchu/ghc
  • Mathemagician98/ghc
  • trac-taylorfausak/ghc
  • leungbk/ghc
  • MichaWiedenmann/ghc
  • chris-martin/ghc
  • TDecki/ghc
  • adithyaov/ghc
  • trac-gelisam/ghc
  • Lysxia/ghc
  • complyue/ghc
  • bwignall/ghc
  • sternmull/ghc
  • sonika/ghc
  • leif/ghc
  • broadwaylamb/ghc
  • myszon/ghc
  • danbroooks/ghc
  • Mechachleopteryx/ghc
  • zardyh/ghc
  • trac-vdukhovni/ghc
  • OmarKhaledAbdo/ghc
  • arrowd/ghc
  • Bodigrim/ghc
  • matheus23/ghc
  • cardenaso11/ghc
  • trac-Athas/ghc
  • mb720/ghc
  • DylanZA/ghc
  • liff/ghc
  • typedrat/ghc
  • trac-claude/ghc
  • jbm/ghc
  • Gertjan423/ghc
  • PHO/ghc
  • JKTKops/ghc
  • kockahonza/ghc
  • msakai/ghc
  • Sir4ur0n/ghc
  • barambani/ghc
  • vishnu.c/ghc
  • dcoutts/ghc
  • trac-runeks/ghc
  • trac-MaxGabriel/ghc
  • lexi.lambda/ghc
  • strake/ghc
  • spavikevik/ghc
  • JakobBruenker/ghc
  • rmanne/ghc
  • gdziadkiewicz/ghc
  • ani/ghc
  • iliastsi/ghc
  • smunix/ghc
  • judah/ghc
  • blackgnezdo/ghc
  • emilypi/ghc
  • trac-bpfoley/ghc
  • muesli4/ghc
  • trac-gkaracha/ghc
  • Kleidukos/ghc
  • nek0/ghc
  • TristanCacqueray/ghc
  • dwulive/ghc
  • mbakke/ghc
  • arybczak/ghc
  • Yang123321/ghc
  • maksbotan/ghc
  • QuietMisdreavus/ghc
  • trac-olshanskydr/ghc
  • emekoi/ghc
  • samuela/ghc
  • josephcsible/ghc
  • dramforever/ghc
  • lpsmith/ghc
  • DenisFrezzato/ghc
  • michivi/ghc
  • jneira/ghc
  • jeffhappily/ghc
  • Ivan-Yudin/ghc
  • nakaji-dayo/ghc
  • gdevanla/ghc
  • galen/ghc
  • fendor/ghc
  • yaitskov/ghc
  • rcythr/ghc
  • awpr/ghc
  • jeremyschlatter/ghc
  • Aver1y/ghc
  • mitchellvitez/ghc
  • merijn/ghc
  • tomjaguarpaw1/ghc
  • trac-NoidedSuper/ghc
  • erewok/ghc
  • trac-junji.hashimoto/ghc
  • adamwespiser/ghc
  • bjaress/ghc
  • jhrcek/ghc
  • leonschoorl/ghc
  • lukasz-golebiewski/ghc
  • sheaf/ghc
  • last-g/ghc
  • carassius1014/ghc
  • eschwartz/ghc
  • dwincort/ghc
  • felixwiemuth/ghc
  • TimWSpence/ghc
  • marcusmonteirodesouza/ghc
  • WJWH/ghc
  • vtols/ghc
  • theobat/ghc
  • BinderDavid/ghc
  • ckoparkar0/ghc
  • alexander-kjeldaas/ghc
  • dme2/ghc
  • philderbeast/ghc
  • aaronallen8455/ghc
  • rayshih/ghc
  • benkard/ghc
  • mpardalos/ghc
  • saidelman/ghc
  • leiftw/ghc
  • ca333/ghc
  • bwroga/ghc
  • nmichael44/ghc
  • trac-crobbins/ghc
  • felixonmars/ghc
  • adityagupta1089/ghc
  • hgsipiere/ghc
  • treeowl/ghc
  • alexpeits/ghc
  • CraigFe/ghc
  • dnlkrgr/ghc
  • kerckhove_ts/ghc
  • cptwunderlich/ghc
  • eiais/ghc
  • hahohihu/ghc
  • sanchayan/ghc
  • lemmih/ghc
  • sehqlr/ghc
  • trac-dbeacham/ghc
  • luite/ghc
  • trac-f-a/ghc
  • vados/ghc
  • luntain/ghc
  • fatho/ghc
  • alexbiehl-gc/ghc
  • dcbdan/ghc
  • tvh/ghc
  • liam-ly/ghc
  • timbobbarnes/ghc
  • GovanifY/ghc
  • shanth2600/ghc
  • gliboc/ghc
  • duog/ghc
  • moxonsghost/ghc
  • zander/ghc
  • masaeedu/ghc
  • georgefst/ghc
  • guibou/ghc
  • nicuveo/ghc
  • mdebruijne/ghc
  • stjordanis/ghc
  • emiflake/ghc
  • wygulmage/ghc
  • frasertweedale/ghc
  • coot/ghc
  • aratamizuki/ghc
  • tsandstr/ghc
  • mrBliss/ghc
  • Anton-Latukha/ghc
  • tadfisher/ghc
  • vapourismo/ghc
  • Sorokin-Anton/ghc
  • basile-henry/ghc
  • trac-mightybyte/ghc
  • AbsoluteNikola/ghc
  • cobrien99/ghc
  • songzh/ghc
  • blamario/ghc
  • aj4ayushjain/ghc
  • trac-utdemir/ghc
  • tangcl/ghc
  • hdgarrood/ghc
  • maerwald/ghc
  • arjun/ghc
  • ratherforky/ghc
  • haskieLambda/ghc
  • EmilGedda/ghc
  • Bogicevic/ghc
  • eddiejessup/ghc
  • kozross/ghc
  • AlistairB/ghc
  • 3Rafal/ghc
  • christiaanb/ghc
  • trac-bit/ghc
  • matsumonkie/ghc
  • trac-parsonsmatt/ghc
  • chisui/ghc
  • jaro/ghc
  • trac-kmiyazato/ghc
  • davidsd/ghc
  • Tritlo/ghc
  • I-B-3/ghc
  • lykahb/ghc
  • AriFordsham/ghc
  • turion1/ghc
  • berberman/ghc
  • christiantakle/ghc
  • zyklotomic/ghc
  • trac-ocramz/ghc
  • CSEdd/ghc
  • doyougnu/ghc
  • mmhat/ghc
  • why-not-try-calmer/ghc
  • plutotulp/ghc
  • kjekac/ghc
  • Manvi07/ghc
  • teo/ghc
  • cactus/ghc
  • CarrieMY/ghc
  • abel/ghc
  • yihming/ghc
  • tsakki/ghc
  • jessicah/ghc
  • oliverbunting/ghc
  • meld/ghc
  • friedbrice/ghc
  • Joald/ghc
  • abarbu/ghc
  • DigitalBrains1/ghc
  • sterni/ghc
  • alexDarcy/ghc
  • hexchain/ghc
  • minimario/ghc
  • zliu41/ghc
  • tommd/ghc
  • jazcarate/ghc
  • peterbecich/ghc
  • alirezaghey/ghc
  • solomon/ghc
  • mikael.urankar/ghc
  • davjam/ghc
  • int-index/ghc
  • MorrowM/ghc
  • nrnrnr/ghc
  • Sonfamm/ghc-test-only
  • afzt1/ghc
  • nguyenhaibinh-tpc/ghc
  • trac-lierdakil/ghc
  • MichaWiedenmann1/ghc
  • jmorag/ghc
  • Ziharrk/ghc
  • trac-MitchellSalad/ghc
  • juampe/ghc
  • jwaldmann/ghc
  • snowleopard/ghc
  • juhp/ghc
  • normalcoder/ghc
  • ksqsf/ghc
  • trac-jberryman/ghc
  • roberth/ghc
  • 1ntEgr8/ghc
  • epworth/ghc
  • MrAdityaAlok/ghc
  • JunmingZhao42/ghc
  • jappeace/ghc
  • trac-Gabriel439/ghc
  • alt-romes/ghc
  • HugoPeters1024/ghc
  • 10ne1/ghc-fork
  • agentultra/ghc
  • Garfield1002/ghc
  • ChickenProp/ghc
  • clyring/ghc
  • MaxHearnden/ghc
  • jumper149/ghc
  • vem/ghc
  • ketzacoatl/ghc
  • Rosuavio/ghc
  • jackohughes/ghc
  • p4l1ly/ghc
  • konsumlamm/ghc
  • shlevy/ghc
  • torsten.schmits/ghc
  • andremarianiello/ghc
  • amesgen/ghc
  • googleson78/ghc
  • InfiniteVerma/ghc
  • uhbif19/ghc
  • yiyunliu/ghc
  • raehik/ghc
  • mrkun/ghc
  • telser/ghc
  • 1Jajen1/ghc
  • slotThe/ghc
  • WinstonHartnett/ghc
  • mpilgrem/ghc
  • dreamsmasher/ghc
  • schuelermine/ghc
  • trac-Viwor/ghc
  • undergroundquizscene/ghc
  • evertedsphere/ghc
  • coltenwebb/ghc
  • oberblastmeister/ghc
  • agrue/ghc
  • lf-/ghc
  • zacwood9/ghc
  • steshaw/ghc
  • high-cloud/ghc
  • SkamDart/ghc
  • PiDelport/ghc
  • maoif/ghc
  • RossPaterson/ghc
  • CharlesTaylor7/ghc
  • ribosomerocker/ghc
  • trac-ramirez7/ghc
  • daig/ghc
  • NicolasT/ghc
  • FinleyMcIlwaine/ghc
  • lawtonnichols/ghc
  • jmtd/ghc
  • ozkutuk/ghc
  • wildsebastian/ghc
  • nikshalark/ghc
  • lrzlin/ghc
  • tobias/ghc
  • fw/ghc
  • hawkinsw/ghc
  • type-dance/ghc
  • rui314/ghc
  • ocharles/ghc
  • wavewave/ghc
  • TheKK/ghc
  • nomeata/ghc
  • trac-csabahruska/ghc
  • jonathanjameswatson/ghc
  • L-as/ghc
  • Axman6/ghc
  • barracuda156/ghc
  • trac-jship/ghc
  • jake-87/ghc
  • meooow/ghc
  • rebeccat/ghc
  • hamana55/ghc
  • Enigmage/ghc
  • kokobd/ghc
  • agevelt/ghc
  • gshen42/ghc
  • chrismwendt/ghc
  • MangoIV/ghc
  • teto/ghc
  • Sookr1/ghc
  • trac-thomasjm/ghc
  • barci2/ghc-dev
  • trac-m4dc4p/ghc
  • dixonary/ghc
  • breakerzirconia/ghc
  • alexsio27444/ghc
  • glocq/ghc
  • sourabhxyz/ghc
  • ryantrinkle/ghc
  • Jade/ghc
  • scedfaliako/ghc
  • martijnbastiaan/ghc
  • trac-george.colpitts/ghc
  • ammarbinfaisal/ghc
  • mimi.vx/ghc
  • lortabac/ghc
  • trac-zyla/ghc
  • benbellick/ghc
  • aadaa-fgtaa/ghc
  • jvanbruegge/ghc
  • archbung/ghc
  • gilmi/ghc
  • mfonism/ghc
  • alex-mckenna/ghc
  • Ei30metry/ghc
  • DiegoDiverio/ghc
  • jorgecunhamendes/ghc
  • liesnikov/ghc
  • akrmn/ghc
  • trac-simplifierticks/ghc
  • jacco/ghc
  • rhendric/ghc
  • damhiya/ghc
  • ryndubei/ghc
  • DaveBarton/ghc
  • trac-Profpatsch/ghc
  • GZGavinZhao/ghc
  • ncfavier/ghc
  • jameshaydon/ghc
  • ajccosta/ghc
  • dschrempf/ghc
  • cydparser/ghc
  • LinuxUserGD/ghc
  • elodielander/ghc
  • facundominguez/ghc
  • psilospore/ghc
  • lachrimae/ghc
  • dylan-thinnes/ghc-type-errors-plugin
  • hamishmack/ghc
  • Leary/ghc
  • lzszt/ghc
  • lyokha/ghc
  • trac-glaubitz/ghc
  • Rewbert/ghc
  • andreabedini/ghc
  • Jasagredo/ghc
  • sol/ghc
  • OlegAlexander/ghc
  • trac-sthibaul/ghc
  • avdv/ghc
  • Wendaolee/ghc
  • ur4t/ghc
  • daylily/ghc
  • boltzmannrain/ghc
  • mmzk1526/ghc
  • trac-fizzixnerd/ghc
  • soulomoon/ghc
  • rwmjones/ghc
  • j14i/ghc
  • tracsis/ghc
  • gesh/ghc
  • flip101/ghc
  • eldritch-cookie/ghc
  • LemonjamesD/ghc
  • pgujjula/ghc
  • skeuchel/ghc
  • noteed/ghc
  • gulin.serge/ghc
  • Torrekie/ghc
  • jlwoodwa/ghc
  • ayanamists/ghc
  • husong998/ghc
  • trac-edmundnoble/ghc
  • josephf/ghc
  • contrun/ghc
  • baulig/ghc
  • edsko/ghc
  • mzschr/ghc-issue-24732
  • ulidtko/ghc
  • Arsen/ghc
  • trac-sjoerd_visscher/ghc
  • crumbtoo/ghc
  • L0neGamer/ghc
  • DrewFenwick/ghc
  • benz0li/ghc
  • MaciejWas/ghc
  • jordanrule/ghc
  • trac-qqwy/ghc
  • LiamGoodacre/ghc
  • isomorpheme/ghc
  • trac-danidiaz/ghc
  • Kariim/ghc
  • MTaimoorZaeem/ghc
  • hololeap/ghc
  • ticat-fp/ghc
  • meritamen/ghc
  • criskell/ghc
  • trac-kraai/ghc
  • aergus/ghc
  • jdral/ghc
  • SamB/ghc
  • Tristian/ghc
  • ywgrit/ghc
  • KatsuPatrick/ghc
  • OsePedro/ghc
  • mpscholten/ghc
  • fp/ghc
  • zaquest/ghc
  • fangyi-zhou/ghc
  • augyg/ghc
640 results
Show changes
Commits on Source (28)
  • sheaf's avatar
    Add fused multiply-add instructions · 87eebf98
    sheaf authored and Marge Bot's avatar Marge Bot committed
    This patch adds eight new primops that fuse a multiplication and an
    addition or subtraction:
    
      - `{fmadd,fmsub,fnmadd,fnmsub}{Float,Double}#`
    
    fmadd x y z is x * y + z, computed with a single rounding step.
    
    This patch implements code generation for these primops in the following
    backends:
    
      - X86, AArch64 and PowerPC NCG,
      - LLVM
      - C
    
    WASM uses the C implementation. The primops are unsupported in the
    JavaScript backend.
    
    The following constant folding rules are also provided:
    
      - compute a * b + c when a, b, c are all literals,
      - x * y + 0 ==> x * y,
      - ±1 * y + z ==> z ± y and x * ±1 + z ==> z ± x.
    
    NB: the constant folding rules incorrectly handle signed zero.
    This is a known limitation with GHC's floating-point constant folding
    rules (#21227), which we hope to resolve in the future.
    87eebf98
  • Krzysztof Gogolewski's avatar
    Add a test for #21278 · ad16a066
    Krzysztof Gogolewski authored and Marge Bot's avatar Marge Bot committed
    ad16a066
  • Matthew Pickering's avatar
    rts: Refine memory retention behaviour to account for pinned/compacted objects · 05cea68c
    Matthew Pickering authored and Marge Bot's avatar Marge Bot committed
    When using the copying collector there is still a lot of data which
    isn't copied (such as pinned, compacted, large objects etc). The logic
    to decide how much memory to retain didn't take into account that these
    wouldn't be copied. Therefore we pessimistically retained 2* the amount
    of memory for these blocks even though they wouldn't be copied by the
    collector.
    
    The solution is to split up the heap into two parts, the parts which
    will be copied and the parts which won't be copied. Then the appropiate
    factor is applied to each part individually (2 * for copying and 1.2 *
    for not copying).
    
    The T23221 test demonstrates this improvement with a program which first
    allocates many unpinned ByteArray# followed by many pinned ByteArray#
    and observes the difference in the ultimate memory baseline between the
    two.
    
    There are some charts on #23221.
    
    Fixes #23221
    05cea68c
  • Cheng Shao's avatar
    hadrian: fix no_dynamic_libs flavour transformer · 1bb24432
    Cheng Shao authored and Marge Bot's avatar Marge Bot committed
    This patch fixes the no_dynamic_libs flavour transformer and make
    fully_static reuse it. Previously building with no_dynamic_libs fails
    since ghc program is still dynamic and transitively brings in dyn ways
    of rts which are produced by no rules.
    1bb24432
  • Josh Meredith's avatar
    JS: refactor jsSaturate to return a saturated JStat (#23328) · 0ed493a3
    Josh Meredith authored and Marge Bot's avatar Marge Bot committed
    0ed493a3
  • Pierre Le Marre's avatar
    Doc: Fix out-of-sync using-optimisation page · a856d98e
    Pierre Le Marre authored and Marge Bot's avatar Marge Bot committed
    - Make explicit that default flag values correspond to their -O0 value.
    - Fix -fignore-interface-pragmas, -fstg-cse, -fdo-eta-reduction,
      -fcross-module-specialise, -fsolve-constant-dicts, -fworker-wrapper.
    a856d98e
  • sheaf's avatar
    Don't panic in mkNewTyConRhs · c176ad18
    sheaf authored and Marge Bot's avatar Marge Bot committed
    This function could come across invalid newtype constructors, as we
    only perform validity checking of newtypes once we are outside the
    knot-tied typechecking loop.
    This patch changes this function to fake up a stub type in the case of
    an invalid newtype, instead of panicking.
    
    This patch also changes "checkNewDataCon" so that it reports as many
    errors as possible at once.
    
    Fixes #23308
    c176ad18
  • Krzysztof Gogolewski's avatar
    Allow Core optimizations when interpreting bytecode · ab63daac
    Krzysztof Gogolewski authored and Marge Bot's avatar Marge Bot committed
    Tracking ticket: #23056
    
    MR: !10399
    
    This adds the flag `-funoptimized-core-for-interpreter`, permitting use
    of the `-O` flag to enable optimizations when compiling with the
    interpreter backend, like in ghci.
    ab63daac
  • Ben Gamari's avatar
    hadrian: Fix mention of non-existent removeFiles function · c6cf9433
    Ben Gamari authored and Marge Bot's avatar Marge Bot committed
    Previously Hadrian's bindist Makefile referred to a `removeFiles`
    function that was previously defined by the `make` build system. Since
    the `make` build system is no longer around, this function is now
    undefined. Naturally, make being make, this appears to be silently
    ignored instead of producing an error.
    
    Fix this by rewriting it to `rm -f`.
    
    Closes #23373.
    c6cf9433
  • Bodigrim's avatar
    Mention new implementation of GHC.IORef.atomicSwapIORef in the changelog · eb60ec18
    Bodigrim authored and Marge Bot's avatar Marge Bot committed
    eb60ec18
  • Teo Camarasu's avatar
    rts: Ensure non-moving gc is not running when pausing · aa84cff4
    Teo Camarasu authored and Marge Bot's avatar Marge Bot committed
    aa84cff4
  • Teo Camarasu's avatar
    rts: Teach listAllBlocks about nonmoving heap · 5ad776ab
    Teo Camarasu authored and Marge Bot's avatar Marge Bot committed
    List all blocks on the non-moving heap.
    
    Resolves #22627
    5ad776ab
  • Krzysztof Gogolewski's avatar
    Fix coercion optimisation for SelCo (#23362) · d683b2e5
    Krzysztof Gogolewski authored and Marge Bot's avatar Marge Bot committed
    setNominalRole_maybe is supposed to output a nominal coercion.
    In the SelCo case, it was not updating the stored role to Nominal,
    causing #23362.
    d683b2e5
  • Alexis King's avatar
    hadrian: Fix linker script flag for MergeObjects builder · 59aa4676
    Alexis King authored and Marge Bot's avatar Marge Bot committed
    This fixes what appears to have been a typo in !9530. The `-t` flag just
    enables tracing on all versions of `ld` I’ve looked at, while `-T` is
    used to specify a linker script. It seems that this worked anyway for
    some reason on some `ld` implementations (perhaps because they
    automatically detect linker scripts), but the missing `-T` argument
    causes `gold` to complain.
    59aa4676
  • Adam Gundry's avatar
    Less coercion optimization for non-newtype axioms · 4bf9fa0f
    Adam Gundry authored and Marge Bot's avatar Marge Bot committed
    See Note [Push transitivity inside newtype axioms only] for an explanation
    of the change here.  This change substantially improves the performance of
    coercion optimization for programs involving transitive type family reductions.
    
    -------------------------
    Metric Decrease:
        CoOpt_Singletons
        LargeRecord
        T12227
        T12545
        T13386
        T15703
        T5030
        T8095
    -------------------------
    4bf9fa0f
  • Adam Gundry's avatar
    Move checkAxInstCo to GHC.Core.Lint · dc0c9574
    Adam Gundry authored and Marge Bot's avatar Marge Bot committed
    A consequence of the previous change is that checkAxInstCo is no longer
    called during coercion optimization, so it can be moved back where it belongs.
    
    Also includes some edits to Note [Conflict checking with AxiomInstCo] as
    suggested by @simonpj.
    dc0c9574
  • Simon Peyton Jones's avatar
    Use the eager unifier in the constraint solver · 8b9b7dbc
    Simon Peyton Jones authored and Marge Bot's avatar Marge Bot committed
    This patch continues the refactoring of the constraint solver
    described in #23070.
    
    The Big Deal in this patch is to call the regular, eager unifier from the
    constraint solver, when we want to create new equalities. This
    replaces the existing, unifyWanted which amounted to
    yet-another-unifier, so it reduces duplication of a rather subtle
    piece of technology. See
    
      * Note [The eager unifier] in GHC.Tc.Utils.Unify
      * GHC.Tc.Solver.Monad.wrapUnifierTcS
    
    I did lots of other refactoring along the way
    
    * I simplified the treatment of right hand sides that contain CoercionHoles.
      Now, a constraint that contains a hetero-kind CoercionHole is non-canonical,
      and cannot be used for rewriting or unification alike.  This required me
      to add the ch_hertero_kind flag to CoercionHole, with consequent knock-on
      effects. See wrinkle (2) of `Note [Equalities with incompatible kinds]` in
      GHC.Tc.Solver.Equality.
    
    * I refactored the StopOrContinue type to add StartAgain, so that after a
      fundep improvement (for example) we can simply start the pipeline again.
    
    * I got rid of the unpleasant (and inefficient) rewriterSetFromType/Co functions.
      With Richard I concluded that they are never needed.
    
    * I discovered Wrinkle (W1) in Note [Wanteds rewrite Wanteds] in
      GHC.Tc.Types.Constraint, and therefore now prioritise non-rewritten equalities.
    
    Quite a few error messages change, I think always for the better.
    
    Compiler runtime stays about the same, with one outlier: a 17% improvement in T17836
    
    Metric Decrease:
        T17836
        T18223
    8b9b7dbc
  • Bartłomiej Cieślar's avatar
    Cleanup of dynflags override in export renaming · 5cad28e7
    Bartłomiej Cieślar authored and Marge Bot's avatar Marge Bot committed
    The deprecation warnings are normally emitted whenever the name's GRE is being looked up, which calls the GHC.Rename.Env.addUsedGRE function. We do not want those warnings to be emitted when renaming export lists, so they are artificially turned off by removing all warning categories from DynFlags at the beginning of GHC.Tc.Gen.Export.rnExports. This commit removes that dependency by unifying the function used for GRE lookup in lookup_ie to lookupGreAvailRn and disabling the call to addUsedGRE in said function (the warnings are also disabled in a call to lookupSubBndrOcc_helper in lookupChildrenExport), as per #17957. This commit also changes the setting for whether to warn about deprecated names in addUsedGREs to be an explicit enum instead of a boolean.
    5cad28e7
  • Alexis King's avatar
    Use a uniform return convention in bytecode for unary results · d85ed900
    Alexis King authored and Marge Bot's avatar Marge Bot committed
    fixes #22958
    d85ed900
  • Bodigrim's avatar
  • Simon Peyton Jones's avatar
    Make GHC.Types.Id.Make.shouldUnpackTy a bit more clever · 902f0730
    Simon Peyton Jones authored and Marge Bot's avatar Marge Bot committed
    As #23307, GHC.Types.Id.Make.shouldUnpackTy was leaving money on the
    table, failing to unpack arguments that are perfectly unpackable.
    
    The fix is pretty easy; see Note [Recursive unboxing]
    902f0730
  • sheaf's avatar
    Fix bad multiplicity role in tyConAppFunCo_maybe · a5451438
    sheaf authored and Marge Bot's avatar Marge Bot committed
    The function tyConAppFunCo_maybe produces a multiplicity coercion
    for the multiplicity argument of the function arrow, except that
    it could be at the wrong role if asked to produce a representational
    coercion. We fix this by using the 'funRole' function, which computes
    the right roles for arguments to the function arrow TyCon.
    
    Fixes #23386
    a5451438
  • sheaf's avatar
    Turn "ambiguous import" error into a panic · 5b9e9300
    sheaf authored and Marge Bot's avatar Marge Bot committed
    This error should never occur, as a lookup of a type or data constructor
    should never be ambiguous. This is because a single module cannot export
    multiple Names with the same OccName, as per item (1) of
    Note [Exporting duplicate declarations] in GHC.Tc.Gen.Export.
    
    This code path was intended to handle duplicate record fields, but the
    rest of the code had since been refactored to handle those in a
    different way.
    
    We also remove the AmbiguousImport constructor of IELookupError, as
    it is no longer used.
    
    Fixes #23302
    5b9e9300
  • Matthew Farkas-Dyck's avatar
    Unbreak some tests with latest GNU grep, which now warns about stray '\'. · e305e60c
    Matthew Farkas-Dyck authored and Marge Bot's avatar Marge Bot committed
    Confusingly, the testsuite mangled the error to say "stray /".
    
    We also migrate some tests from grep to grep -E, as it seems the author actually wanted an "POSIX extended" (a.k.a. sane) regex.
    
    Background: POSIX specifies 2 "regex" syntaxen: "basic" and "extended". Of these, only "extended" syntax is actually a regular expression. Furthermore, "basic" syntax is inconsistent in its use of the '\' character — sometimes it escapes a regex metacharacter, but sometimes it unescapes it, i.e. it makes an otherwise normal character become a metacharacter. This baffles me and it seems also the authors of these tests. Also, the regex(7) man page (at least on Linux) says "basic" syntax is obsolete. Nearly all modern tools and libraries are consistent in this use of the '\' character (of which many use "extended" syntax by default).
    e305e60c
  • sheaf's avatar
    Improve "ambiguous occurrence" error messages · 5ae81842
    sheaf authored and Marge Bot's avatar Marge Bot committed
    This error was sometimes a bit confusing, especially when data families
    were involved. This commit improves the general presentation of the
    "ambiguous occurrence" error, and adds a bit of extra context in the
    case of data families.
    
    Fixes #23301
    5ae81842
  • Sylvain Henry's avatar
    Fix GHCJS OS platform (fix #23346) · 2f571afe
    Sylvain Henry authored and Marge Bot's avatar Marge Bot committed
    2f571afe
  • Oleg Grenrus's avatar
    Split DynFlags structure into own module · 86aae570
    Oleg Grenrus authored and Marge Bot's avatar Marge Bot committed
    This will allow to make command line parsing to depend on
    diagnostic system (which depends on dynflags)
    86aae570
  • Ben Gamari's avatar
    Extension shuffling (#23291) · 07ec9f6f
    Ben Gamari authored
    Where introduced 4 new extensions:
      - PatternSignatures
      - ExtendedForAllScope
      - MethodTypeVariables
      - ImplicitForAll
    
    Tasks of ScopedTypeVariables extension were distributed between
    PatternSignatures, ExtendedForAllScope and MethodTypeVariables according
    to the proposal. Now ScopedTypeVaribles only implies these three exntesions.
    
    Extension ImplicitForAll saves current behavior. NoImplicitForAll
    disables implicit bounding of type variables in many contexts.
    
    Was introduced one new warning option: -Wpattern-signature-binds
    It warns when pattern signature binds into scope new type variable. For
    example:
    
      f (a :: t) = ...
    07ec9f6f
Showing
with 584 additions and 160 deletions
......@@ -1369,6 +1369,75 @@ primop FloatDecode_IntOp "decodeFloat_Int#" GenPrimOp
First 'Int#' in result is the mantissa; second is the exponent.}
with out_of_line = True
------------------------------------------------------------------------
section "Fused multiply-add operations"
{ #fma#
The fused multiply-add primops 'fmaddFloat#' and 'fmaddDouble#'
implement the operation
\[
\lambda\ x\ y\ z \rightarrow x * y + z
\]
with a single floating-point rounding operation at the end, as opposed to
rounding twice (which can accumulate rounding errors).
These primops can be compiled directly to a single machine instruction on
architectures that support them. Currently, these are:
1. x86 with CPUs that support the FMA3 extended instruction set (which
includes most processors since 2013).
2. PowerPC.
3. AArch64.
This requires users pass the '-mfma' flag to GHC. Otherwise, the primop
is implemented by falling back to the C standard library, which might
perform software emulation (this may yield results that are not IEEE
compliant on some platforms).
The additional operations 'fmsubFloat#'/'fmsubDouble#',
'fnmaddFloat#'/'fnmaddDouble#' and 'fnmsubFloat#'/'fnmsubDouble#' provide
variants on 'fmaddFloat#'/'fmaddDouble#' in which some signs are changed:
\[
\begin{aligned}
\mathrm{fmadd}\ x\ y\ z &= \phantom{+} x * y + z \\[8pt]
\mathrm{fmsub}\ x\ y\ z &= \phantom{+} x * y - z \\[8pt]
\mathrm{fnmadd}\ x\ y\ z &= - x * y + z \\[8pt]
\mathrm{fnmsub}\ x\ y\ z &= - x * y - z
\end{aligned}
\]
}
------------------------------------------------------------------------
primop FloatFMAdd "fmaddFloat#" GenPrimOp
Float# -> Float# -> Float# -> Float#
{Fused multiply-add operation @x*y+z@. See "GHC.Prim#fma".}
primop FloatFMSub "fmsubFloat#" GenPrimOp
Float# -> Float# -> Float# -> Float#
{Fused multiply-subtract operation @x*y-z@. See "GHC.Prim#fma".}
primop FloatFNMAdd "fnmaddFloat#" GenPrimOp
Float# -> Float# -> Float# -> Float#
{Fused negate-multiply-add operation @-x*y+z@. See "GHC.Prim#fma".}
primop FloatFNMSub "fnmsubFloat#" GenPrimOp
Float# -> Float# -> Float# -> Float#
{Fused negate-multiply-subtract operation @-x*y-z@. See "GHC.Prim#fma".}
primop DoubleFMAdd "fmaddDouble#" GenPrimOp
Double# -> Double# -> Double# -> Double#
{Fused multiply-add operation @x*y+z@. See "GHC.Prim#fma".}
primop DoubleFMSub "fmsubDouble#" GenPrimOp
Double# -> Double# -> Double# -> Double#
{Fused multiply-subtract operation @x*y-z@. See "GHC.Prim#fma".}
primop DoubleFNMAdd "fnmaddDouble#" GenPrimOp
Double# -> Double# -> Double# -> Double#
{Fused negate-multiply-add operation @-x*y+z@. See "GHC.Prim#fma".}
primop DoubleFNMSub "fnmsubDouble#" GenPrimOp
Double# -> Double# -> Double# -> Double#
{Fused negate-multiply-subtract operation @-x*y-z@. See "GHC.Prim#fma".}
------------------------------------------------------------------------
section "Arrays"
{Operations on 'Array#'.}
......
......@@ -395,10 +395,7 @@ assembleI platform i = case i of
PUSH_BCO proto -> do let ul_bco = assembleBCO platform proto
p <- ioptr (liftM BCOPtrBCO ul_bco)
emit bci_PUSH_G [Op p]
PUSH_ALTS proto -> do let ul_bco = assembleBCO platform proto
p <- ioptr (liftM BCOPtrBCO ul_bco)
emit bci_PUSH_ALTS [Op p]
PUSH_ALTS_UNLIFTED proto pk
PUSH_ALTS proto pk
-> do let ul_bco = assembleBCO platform proto
p <- ioptr (liftM BCOPtrBCO ul_bco)
emit (push_alts pk) [Op p]
......@@ -504,8 +501,7 @@ assembleI platform i = case i of
SWIZZLE stkoff n -> emit bci_SWIZZLE [SmallOp stkoff, SmallOp n]
JMP l -> emit bci_JMP [LabelOp l]
ENTER -> emit bci_ENTER []
RETURN -> emit bci_RETURN []
RETURN_UNLIFTED rep -> emit (return_unlifted rep) []
RETURN rep -> emit (return_non_tuple rep) []
RETURN_TUPLE -> emit bci_RETURN_T []
CCALL off m_addr i -> do np <- addr m_addr
emit bci_CCALL [SmallOp off, Op np, SmallOp i]
......@@ -574,16 +570,16 @@ push_alts V16 = error "push_alts: vector"
push_alts V32 = error "push_alts: vector"
push_alts V64 = error "push_alts: vector"
return_unlifted :: ArgRep -> Word16
return_unlifted V = bci_RETURN_V
return_unlifted P = bci_RETURN_P
return_unlifted N = bci_RETURN_N
return_unlifted L = bci_RETURN_L
return_unlifted F = bci_RETURN_F
return_unlifted D = bci_RETURN_D
return_unlifted V16 = error "return_unlifted: vector"
return_unlifted V32 = error "return_unlifted: vector"
return_unlifted V64 = error "return_unlifted: vector"
return_non_tuple :: ArgRep -> Word16
return_non_tuple V = bci_RETURN_V
return_non_tuple P = bci_RETURN_P
return_non_tuple N = bci_RETURN_N
return_non_tuple L = bci_RETURN_L
return_non_tuple F = bci_RETURN_F
return_non_tuple D = bci_RETURN_D
return_non_tuple V16 = error "return_non_tuple: vector"
return_non_tuple V32 = error "return_non_tuple: vector"
return_non_tuple V64 = error "return_non_tuple: vector"
{-
we can only handle up to a fixed number of words on the stack,
......
......@@ -88,8 +88,7 @@ data BCInstr
| PUSH_BCO (ProtoBCO Name)
-- Push an alt continuation
| PUSH_ALTS (ProtoBCO Name)
| PUSH_ALTS_UNLIFTED (ProtoBCO Name) ArgRep
| PUSH_ALTS (ProtoBCO Name) ArgRep
| PUSH_ALTS_TUPLE (ProtoBCO Name) -- continuation
!NativeCallInfo
(ProtoBCO Name) -- tuple return BCO
......@@ -197,9 +196,10 @@ data BCInstr
-- To Infinity And Beyond
| ENTER
| RETURN -- return a lifted value
| RETURN_UNLIFTED ArgRep -- return an unlifted value, here's its rep
| RETURN_TUPLE -- return an unboxed tuple (info already on stack)
| RETURN ArgRep -- return a non-tuple value, here's its rep; see
-- Note [Return convention for non-tuple values] in GHC.StgToByteCode
| RETURN_TUPLE -- return an unboxed tuple (info already on stack); see
-- Note [unboxed tuple bytecodes and tuple_BCO] in GHC.StgToByteCode
-- Breakpoints
| BRK_FUN Word16 Unique (RemotePtr CostCentre)
......@@ -274,8 +274,7 @@ instance Outputable BCInstr where
<> ppr op
ppr (PUSH_BCO bco) = hang (text "PUSH_BCO") 2 (ppr bco)
ppr (PUSH_ALTS bco) = hang (text "PUSH_ALTS") 2 (ppr bco)
ppr (PUSH_ALTS_UNLIFTED bco pk) = hang (text "PUSH_ALTS_UNLIFTED" <+> ppr pk) 2 (ppr bco)
ppr (PUSH_ALTS bco pk) = hang (text "PUSH_ALTS" <+> ppr pk) 2 (ppr bco)
ppr (PUSH_ALTS_TUPLE bco call_info tuple_bco) =
hang (text "PUSH_ALTS_TUPLE" <+> ppr call_info)
2
......@@ -352,8 +351,7 @@ instance Outputable BCInstr where
ppr (SWIZZLE stkoff n) = text "SWIZZLE " <+> text "stkoff" <+> ppr stkoff
<+> text "by" <+> ppr n
ppr ENTER = text "ENTER"
ppr RETURN = text "RETURN"
ppr (RETURN_UNLIFTED pk) = text "RETURN_UNLIFTED " <+> ppr pk
ppr (RETURN pk) = text "RETURN " <+> ppr pk
ppr (RETURN_TUPLE) = text "RETURN_TUPLE"
ppr (BRK_FUN index uniq _cc) = text "BRK_FUN" <+> ppr index <+> mb_uniq <+> text "<cc>"
where mb_uniq = sdocOption sdocSuppressUniques $ \case
......@@ -389,10 +387,8 @@ bciStackUse PUSH32_W{} = 1 -- takes exactly 1 word
bciStackUse PUSH_G{} = 1
bciStackUse PUSH_PRIMOP{} = 1
bciStackUse PUSH_BCO{} = 1
bciStackUse (PUSH_ALTS bco) = 2 {- profiling only, restore CCCS -} +
bciStackUse (PUSH_ALTS bco _) = 2 {- profiling only, restore CCCS -} +
3 + protoBCOStackUse bco
bciStackUse (PUSH_ALTS_UNLIFTED bco _) = 2 {- profiling only, restore CCCS -} +
4 + protoBCOStackUse bco
bciStackUse (PUSH_ALTS_TUPLE bco info _) =
-- (tuple_bco, call_info word, cont_bco, stg_ctoi_t)
-- tuple
......@@ -452,8 +448,7 @@ bciStackUse TESTEQ_P{} = 0
bciStackUse CASEFAIL{} = 0
bciStackUse JMP{} = 0
bciStackUse ENTER{} = 0
bciStackUse RETURN{} = 0
bciStackUse RETURN_UNLIFTED{} = 1 -- pushes stg_ret_X for some X
bciStackUse RETURN{} = 1 -- pushes stg_ret_X for some X
bciStackUse RETURN_TUPLE{} = 1 -- pushes stg_ret_t header
bciStackUse CCALL{} = 0
bciStackUse PRIMCALL{} = 1 -- pushes stg_primcall
......
{-# LANGUAGE LambdaCase #-}
{-# OPTIONS_GHC -Wno-incomplete-uni-patterns #-}
module GHC.Cmm.MachOp
......@@ -26,6 +28,9 @@ module GHC.Cmm.MachOp
-- Atomic read-modify-write
, MemoryOrdering(..)
, AtomicMachOp(..)
-- Fused multiply-add
, FMASign(..), pprFMASign
)
where
......@@ -88,6 +93,10 @@ data MachOp
| MO_F_Mul Width
| MO_F_Quot Width
-- Floating-point fused multiply-add operations
-- | Fused multiply-add, see 'FMASign'.
| MO_FMA FMASign Width
-- Floating point comparison
| MO_F_Eq Width
| MO_F_Ne Width
......@@ -160,7 +169,30 @@ data MachOp
pprMachOp :: MachOp -> SDoc
pprMachOp mo = text (show mo)
-- | Where are the signs in a fused multiply-add instruction?
--
-- @x*y + z@ vs @x*y - z@ vs @-x*y+z@ vs @-x*y-z@.
--
-- Warning: the signs aren't consistent across architectures (X86, PowerPC, AArch64).
-- The user-facing implementation uses the X86 convention, while the relevant
-- backends use their corresponding conventions.
data FMASign
-- | Fused multiply-add @x*y + z@.
= FMAdd
-- | Fused multiply-subtract. On X86: @x*y - z@.
| FMSub
-- | Fused multiply-add. On X86: @-x*y + z@.
| FNMAdd
-- | Fused multiply-subtract. On X86: @-x*y - z@.
| FNMSub
deriving (Eq, Show)
pprFMASign :: IsLine doc => FMASign -> doc
pprFMASign = \case
FMAdd -> text "fmadd"
FMSub -> text "fmsub"
FNMAdd -> text "fnmadd"
FNMSub -> text "fnmsub"
-- -----------------------------------------------------------------------------
-- Some common MachReps
......@@ -398,6 +430,9 @@ machOpResultType platform mop tys =
MO_F_Mul r -> cmmFloat r
MO_F_Quot r -> cmmFloat r
MO_F_Neg r -> cmmFloat r
MO_FMA _ r -> cmmFloat r
MO_F_Eq {} -> comparisonResultRep platform
MO_F_Ne {} -> comparisonResultRep platform
MO_F_Ge {} -> comparisonResultRep platform
......@@ -489,6 +524,9 @@ machOpArgReps platform op =
MO_F_Mul r -> [r,r]
MO_F_Quot r -> [r,r]
MO_F_Neg r -> [r]
MO_FMA _ r -> [r,r,r]
MO_F_Eq r -> [r,r]
MO_F_Ne r -> [r,r]
MO_F_Ge r -> [r,r]
......
......@@ -1009,7 +1009,7 @@ machOps = listToUFM $
( "eq", MO_Eq ),
( "ne", MO_Ne ),
( "mul", MO_Mul ),
( "mulmayoflo", MO_S_MulMayOflo ),
( "mulmayoflo", MO_S_MulMayOflo ),
( "neg", MO_S_Neg ),
( "quot", MO_S_Quot ),
( "rem", MO_S_Rem ),
......@@ -1040,6 +1040,11 @@ machOps = listToUFM $
( "fmul", MO_F_Mul ),
( "fquot", MO_F_Quot ),
( "fmadd" , MO_FMA FMAdd ),
( "fmsub" , MO_FMA FMSub ),
( "fnmadd", MO_FMA FNMAdd ),
( "fnmsub", MO_FMA FNMSub ),
( "feq", MO_F_Eq ),
( "fne", MO_F_Ne ),
( "fge", MO_F_Ge ),
......
......@@ -783,7 +783,7 @@ getRegister' config plat expr
where w' = formatToWidth (cmmTypeFormat (cmmRegType reg))
r' = getRegisterReg plat reg
-- Generic case.
-- Generic binary case.
CmmMachOp op [x, y] -> do
-- alright, so we have an operation, and two expressions. And we want to essentially do
-- ensure we get float regs (TODO(Ben): What?)
......@@ -956,7 +956,44 @@ getRegister' config plat expr
-- TODO
op -> pprPanic "getRegister' (unhandled dyadic CmmMachOp): " $ (pprMachOp op) <+> text "in" <+> (pdoc plat expr)
op -> pprPanic "getRegister' (unhandled dyadic CmmMachOp): " $
(pprMachOp op) <+> text "in" <+> (pdoc plat expr)
-- Generic ternary case.
CmmMachOp op [x, y, z] ->
case op of
-- Floating-point fused multiply-add operations
-- x86 fmadd x * y + z <=> AArch64 fmadd : d = r1 * r2 + r3
-- x86 fmsub x * y - z <=> AArch64 fnmsub: d = r1 * r2 - r3
-- x86 fnmadd - x * y + z <=> AArch64 fmsub : d = - r1 * r2 + r3
-- x86 fnmsub - x * y - z <=> AArch64 fnmadd: d = - r1 * r2 - r3
MO_FMA var w -> case var of
FMAdd -> float3Op w (\d n m a -> unitOL $ FMA FMAdd d n m a)
FMSub -> float3Op w (\d n m a -> unitOL $ FMA FNMSub d n m a)
FNMAdd -> float3Op w (\d n m a -> unitOL $ FMA FMSub d n m a)
FNMSub -> float3Op w (\d n m a -> unitOL $ FMA FNMAdd d n m a)
_ -> pprPanic "getRegister' (unhandled ternary CmmMachOp): " $
(pprMachOp op) <+> text "in" <+> (pdoc plat expr)
where
float3Op w op = do
(reg_fx, format_x, code_fx) <- getFloatReg x
(reg_fy, format_y, code_fy) <- getFloatReg y
(reg_fz, format_z, code_fz) <- getFloatReg z
massertPpr (isFloatFormat format_x && isFloatFormat format_y && isFloatFormat format_z) $
text "float3Op: non-float"
return $
Any (floatFormat w) $ \ dst ->
code_fx `appOL`
code_fy `appOL`
code_fz `appOL`
op (OpReg w dst) (OpReg w reg_fx) (OpReg w reg_fy) (OpReg w reg_fz)
CmmMachOp _op _xs
-> pprPanic "getRegister' (variadic CmmMachOp): " (pdoc plat expr)
......
......@@ -142,6 +142,8 @@ regUsageOfInstr platform instr = case instr of
SCVTF dst src -> usage (regOp src, regOp dst)
FCVTZS dst src -> usage (regOp src, regOp dst)
FABS dst src -> usage (regOp src, regOp dst)
FMA _ dst src1 src2 src3 ->
usage (regOp src1 ++ regOp src2 ++ regOp src3, regOp dst)
_ -> panic $ "regUsageOfInstr: " ++ instrCon instr
......@@ -280,6 +282,9 @@ patchRegsOfInstr instr env = case instr of
SCVTF o1 o2 -> SCVTF (patchOp o1) (patchOp o2)
FCVTZS o1 o2 -> FCVTZS (patchOp o1) (patchOp o2)
FABS o1 o2 -> FABS (patchOp o1) (patchOp o2)
FMA s o1 o2 o3 o4 ->
FMA s (patchOp o1) (patchOp o2) (patchOp o3) (patchOp o4)
_ -> panic $ "patchRegsOfInstr: " ++ instrCon instr
where
patchOp :: Operand -> Operand
......@@ -650,6 +655,14 @@ data Instr
-- Float ABSolute value
| FABS Operand Operand
-- | Floating-point fused multiply-add instructions
--
-- - fmadd : d = r1 * r2 + r3
-- - fnmsub: d = r1 * r2 - r3
-- - fmsub : d = - r1 * r2 + r3
-- - fnmadd: d = - r1 * r2 - r3
| FMA FMASign Operand Operand Operand Operand
instrCon :: Instr -> String
instrCon i =
case i of
......@@ -715,6 +728,12 @@ instrCon i =
SCVTF{} -> "SCVTF"
FCVTZS{} -> "FCVTZS"
FABS{} -> "FABS"
FMA variant _ _ _ _ ->
case variant of
FMAdd -> "FMADD"
FMSub -> "FMSUB"
FNMAdd -> "FNMADD"
FNMSub -> "FNMSUB"
data Target
= TBlock BlockId
......
......@@ -546,6 +546,13 @@ pprInstr platform instr = case instr of
SCVTF o1 o2 -> op2 (text "\tscvtf") o1 o2
FCVTZS o1 o2 -> op2 (text "\tfcvtzs") o1 o2
FABS o1 o2 -> op2 (text "\tfabs") o1 o2
FMA variant d r1 r2 r3 ->
let fma = case variant of
FMAdd -> text "\tfmadd"
FMSub -> text "\tfmsub"
FNMAdd -> text "\tfnmadd"
FNMSub -> text "\tfnmsub"
in op4 fma d r1 r2 r3
where op2 op o1 o2 = line $ op <+> pprOp platform o1 <> comma <+> pprOp platform o2
op3 op o1 o2 o3 = line $ op <+> pprOp platform o1 <> comma <+> pprOp platform o2 <> comma <+> pprOp platform o3
op4 op o1 o2 o3 o4 = line $ op <+> pprOp platform o1 <> comma <+> pprOp platform o2 <> comma <+> pprOp platform o3 <> comma <+> pprOp platform o4
......
......@@ -649,6 +649,21 @@ getRegister' _ _ (CmmMachOp mop [x, y]) -- dyadic PrimOps
code <- remainderCode rep sgn tmp x y
return (Any fmt code)
getRegister' _ _ (CmmMachOp mop [x, y, z]) -- ternary PrimOps
= case mop of
-- x86 fmadd x * y + z <> PPC fmadd rt = ra * rc + rb
-- x86 fmsub x * y - z <> PPC fmsub rt = ra * rc - rb
-- x86 fnmadd - x * y + z ~~ PPC fnmsub rt = -(ra * rc - rb)
-- x86 fnmsub - x * y - z ~~ PPC fnmadd rt = -(ra * rc + rb)
MO_FMA variant w ->
case variant of
FMAdd -> fma_code w (FMADD FMAdd) x y z
FMSub -> fma_code w (FMADD FMSub) x y z
FNMAdd -> fma_code w (FMADD FNMAdd) x y z
FNMSub -> fma_code w (FMADD FNMSub) x y z
_ -> panic "PPC.CodeGen.getRegister: no match"
getRegister' _ _ (CmmLit (CmmInt i rep))
| Just imm <- makeImmediate rep True i
......@@ -2358,10 +2373,28 @@ trivialUCode rep instr x = do
let code' dst = code `snocOL` instr dst src
return (Any rep code')
-- | Generate code for a 4-register FMA instruction,
-- e.g. @fmadd rt ra rc rb := rt <- ra * rc + rb@.
fma_code :: Width
-> (Format -> Reg -> Reg -> Reg -> Reg -> Instr)
-> CmmExpr
-> CmmExpr
-> CmmExpr
-> NatM Register
fma_code w instr ra rc rb = do
let rep = floatFormat w
(src1, code1) <- getSomeReg ra
(src2, code2) <- getSomeReg rc
(src3, code3) <- getSomeReg rb
let instrCode rt =
code1 `appOL`
code2 `appOL`
code3 `snocOL` instr rep rt src1 src2 src3
return $ Any rep instrCode
-- There is no "remainder" instruction on the PPC, so we have to do
-- it the hard way.
-- The "sgn" parameter is the signedness for the division instruction
remainderCode :: Width -> Bool -> Reg -> CmmExpr -> CmmExpr
-> NatM (Reg -> InstrBlock)
remainderCode rep sgn reg_q arg_x arg_y = do
......
......@@ -280,6 +280,14 @@ data Instr
| FABS Reg Reg -- abs is the same for single and double
| FNEG Reg Reg -- negate is the same for single and double prec.
-- | Fused multiply-add instructions.
--
-- - FMADD: @rd = (ra * rb) + rd@
-- - FMSUB: @rd = ra * rb - rd@
-- - FNMADD: @rd = -(ra * rb + rd)@
-- - FNMSUB: @rd = -(ra * rb - rd)@
| FMADD FMASign Format Reg Reg Reg Reg
| FCMP Reg Reg
| FCTIWZ Reg Reg -- convert to integer word
......@@ -380,6 +388,7 @@ regUsageOfInstr platform instr
MFCR reg -> usage ([], [reg])
MFLR reg -> usage ([], [reg])
FETCHPC reg -> usage ([], [reg])
FMADD _ _ rt ra rc rb -> usage ([ra, rc, rb], [rt])
_ -> noUsage
where
usage (src, dst) = RU (filter (interesting platform) src)
......@@ -467,6 +476,8 @@ patchRegsOfInstr instr env
FDIV fmt r1 r2 r3 -> FDIV fmt (env r1) (env r2) (env r3)
FABS r1 r2 -> FABS (env r1) (env r2)
FNEG r1 r2 -> FNEG (env r1) (env r2)
FMADD sgn fmt r1 r2 r3 r4
-> FMADD sgn fmt (env r1) (env r2) (env r3) (env r4)
FCMP r1 r2 -> FCMP (env r1) (env r2)
FCTIWZ r1 r2 -> FCTIWZ (env r1) (env r2)
FCTIDZ r1 r2 -> FCTIDZ (env r1) (env r2)
......
......@@ -934,6 +934,9 @@ pprInstr platform instr = case instr of
FNEG reg1 reg2
-> pprUnary (text "fneg") reg1 reg2
FMADD signs fmt dst ra rc rb
-> pprTernaryF (pprFMASign signs) fmt dst ra rc rb
FCMP reg1 reg2
-> line $ hcat [
char '\t',
......@@ -1083,6 +1086,21 @@ pprBinaryF op fmt reg1 reg2 reg3 = line $ hcat [
pprReg reg3
]
pprTernaryF :: IsDoc doc => Line doc -> Format -> Reg -> Reg -> Reg -> Reg -> doc
pprTernaryF op fmt rt ra rc rb = line $ hcat [
char '\t',
op,
pprFFormat fmt,
char '\t',
pprReg rt,
text ", ",
pprReg ra,
text ", ",
pprReg rc,
text ", ",
pprReg rb
]
pprRI :: IsLine doc => Platform -> RI -> doc
pprRI _ (RIReg r) = pprReg r
pprRI platform (RIImm r) = pprImm platform r
......
......@@ -816,7 +816,9 @@ lower_CmmMachOp lbl (MO_SS_Conv w0 w1) xs = lower_MO_SS_Conv lbl w0 w1 xs
lower_CmmMachOp lbl (MO_UU_Conv w0 w1) xs = lower_MO_UU_Conv lbl w0 w1 xs
lower_CmmMachOp lbl (MO_XX_Conv w0 w1) xs = lower_MO_UU_Conv lbl w0 w1 xs
lower_CmmMachOp lbl (MO_FF_Conv w0 w1) xs = lower_MO_FF_Conv lbl w0 w1 xs
lower_CmmMachOp _ _ _ = panic "lower_CmmMachOp: unreachable"
lower_CmmMachOp _ mop _ =
pprPanic "lower_CmmMachOp: unreachable" $
vcat [ text "offending MachOp:" <+> pprMachOp mop ]
-- | Lower a 'CmmLit'. Note that we don't emit 'f32.const' or
-- 'f64.const' for the time being, and instead emit their relative bit
......
......@@ -901,14 +901,10 @@ getRegister' _ is32Bit (CmmMachOp mop [x, y]) = -- dyadic MachOps
MO_U_Lt _ -> condIntReg LU x y
MO_U_Le _ -> condIntReg LEU x y
MO_F_Add w -> trivialFCode_sse2 w ADD x y
MO_F_Sub w -> trivialFCode_sse2 w SUB x y
MO_F_Quot w -> trivialFCode_sse2 w FDIV x y
MO_F_Mul w -> trivialFCode_sse2 w MUL x y
MO_F_Add w -> trivialFCode_sse2 w ADD x y
MO_F_Sub w -> trivialFCode_sse2 w SUB x y
MO_F_Quot w -> trivialFCode_sse2 w FDIV x y
MO_F_Mul w -> trivialFCode_sse2 w MUL x y
MO_Add rep -> add_code rep x y
MO_Sub rep -> sub_code rep x y
......@@ -1113,6 +1109,13 @@ getRegister' _ is32Bit (CmmMachOp mop [x, y]) = -- dyadic MachOps
return (Fixed format result code)
getRegister' _plat _is32Bit (CmmMachOp mop [x, y, z]) = -- ternary MachOps
case mop of
-- Floating point fused multiply-add operations @ ± x*y ± z@
MO_FMA var w -> genFMA3Code w var x y z
_other -> pprPanic "getRegister(x86) - ternary CmmMachOp (1)"
(pprMachOp mop)
getRegister' _ _ (CmmLoad mem pk _)
| isFloatType pk
......@@ -3151,12 +3154,12 @@ genTrivialCode rep instr a b = do
a_code <- getAnyReg a
tmp <- getNewRegNat rep
let
-- We want the value of b to stay alive across the computation of a.
-- But, we want to calculate a straight into the destination register,
-- We want the value of 'b' to stay alive across the computation of 'a'.
-- But, we want to calculate 'a' straight into the destination register,
-- because the instruction only has two operands (dst := dst `op` src).
-- The troublesome case is when the result of b is in the same register
-- as the destination reg. In this case, we have to save b in a
-- new temporary across the computation of a.
-- The troublesome case is when the result of 'b' is in the same register
-- as the destination 'reg'. In this case, we have to save 'b' in a
-- new temporary across the computation of 'a'.
code dst
| dst `regClashesWithOp` b_op =
b_code `appOL`
......@@ -3174,6 +3177,69 @@ reg `regClashesWithOp` OpReg reg2 = reg == reg2
reg `regClashesWithOp` OpAddr amode = any (==reg) (addrModeRegs amode)
_ `regClashesWithOp` _ = False
-- | Generate code for a fused multiply-add operation, of the form @± x * y ± z@,
-- with 3 operands (FMA3 instruction set).
genFMA3Code :: Width
-> FMASign
-> CmmExpr -> CmmExpr -> CmmExpr -> NatM Register
genFMA3Code w signs x y z = do
-- For the FMA instruction, we want to compute x * y + z
--
-- There are three possible instructions we could emit:
--
-- - fmadd213 z y x, result in x, z can be a memory address
-- - fmadd132 x z y, result in y, x can be a memory address
-- - fmadd231 y x z, result in z, y can be a memory address
--
-- This suggests two possible optimisations:
--
-- - OPTIMISATION 1
-- If one argument is an address, use the instruction that allows
-- a memory address in that position.
--
-- - OPTIMISATION 2
-- If one argument is in a fixed register, use the instruction that puts
-- the result in that same register.
--
-- Currently we follow neither of these optimisations,
-- opting to always use fmadd213 for simplicity.
let rep = floatFormat w
(y_reg, y_code) <- getNonClobberedReg y
(z_reg, z_code) <- getNonClobberedReg z
x_code <- getAnyReg x
y_tmp <- getNewRegNat rep
z_tmp <- getNewRegNat rep
let
fma213 = FMA3 rep signs FMA213
code dst
| dst == y_reg
, dst == z_reg
= y_code `appOL`
unitOL (MOV rep (OpReg y_reg) (OpReg y_tmp)) `appOL`
z_code `appOL`
unitOL (MOV rep (OpReg z_reg) (OpReg z_tmp)) `appOL`
x_code dst `snocOL`
fma213 (OpReg z_tmp) y_tmp dst
| dst == y_reg
= y_code `appOL`
unitOL (MOV rep (OpReg y_reg) (OpReg z_tmp)) `appOL`
z_code `appOL`
x_code dst `snocOL`
fma213 (OpReg z_reg) y_tmp dst
| dst == z_reg
= y_code `appOL`
z_code `appOL`
unitOL (MOV rep (OpReg z_reg) (OpReg z_tmp)) `appOL`
x_code dst `snocOL`
fma213 (OpReg z_tmp) y_reg dst
| otherwise
= y_code `appOL`
z_code `appOL`
x_code dst `snocOL`
fma213 (OpReg z_reg) y_reg dst
return (Any rep code)
-----------
trivialUCode :: Format -> (Operand -> Instr)
......
......@@ -12,6 +12,7 @@ module GHC.CmmToAsm.X86.Instr
( Instr(..)
, Operand(..)
, PrefetchVariant(..)
, FMAPermutation(..)
, JumpDest(..)
, getJumpDestBlockId
, canShortcut
......@@ -272,6 +273,10 @@ data Instr
| CVTSI2SS Format Operand Reg -- I32/I64 to F32
| CVTSI2SD Format Operand Reg -- I32/I64 to F64
-- | FMA3 fused multiply-add operations.
| FMA3 Format FMASign FMAPermutation Operand Reg Reg
-- src1 (r/m), src2 (r), dst (r)
-- use ADD, SUB, and SQRT for arithmetic. In both cases, operands
-- are Operand Reg.
......@@ -351,7 +356,7 @@ data Operand
| OpImm Imm -- immediate value
| OpAddr AddrMode -- memory reference
data FMAPermutation = FMA132 | FMA213 | FMA231
-- | Returns which registers are read and written as a (read, written)
-- pair.
......@@ -438,6 +443,8 @@ regUsageOfInstr platform instr
PDEP _ src mask dst -> mkRU (use_R src $ use_R mask []) [dst]
PEXT _ src mask dst -> mkRU (use_R src $ use_R mask []) [dst]
FMA3 _ _ _ src1 src2 dst -> usageFMA src1 src2 dst
-- note: might be a better way to do this
PREFETCH _ _ src -> mkRU (use_R src []) []
LOCK i -> regUsageOfInstr platform i
......@@ -482,6 +489,15 @@ regUsageOfInstr platform instr
usageRMM (OpReg src) (OpAddr ea) (OpReg reg) = mkRU (use_EA ea [src, reg]) [reg]
usageRMM _ _ _ = panic "X86.RegInfo.usageRMM: no match"
-- 3 operand form of FMA instructions.
usageFMA :: Operand -> Reg -> Reg -> RegUsage
usageFMA (OpReg src1) src2 dst
= mkRU [src1, src2, dst] [dst]
usageFMA (OpAddr ea1) src2 dst
= mkRU (use_EA ea1 [src2, dst]) [dst]
usageFMA _ _ _
= panic "X86.RegInfo.usageFMA: no match"
-- 1 operand form; operand Modified
usageM :: Operand -> RegUsage
usageM (OpReg reg) = mkRU [reg] [reg]
......@@ -561,6 +577,8 @@ patchRegsOfInstr instr env
JMP op regs -> JMP (patchOp op) regs
JMP_TBL op ids s lbl -> JMP_TBL (patchOp op) ids s lbl
FMA3 fmt perm var x1 x2 x3 -> patch3 (FMA3 fmt perm var) x1 x2 x3
-- literally only support storing the top x87 stack value st(0)
X87Store fmt dst -> X87Store fmt (lookupAddr dst)
......@@ -612,6 +630,8 @@ patchRegsOfInstr instr env
patch1 insn op = insn $! patchOp op
patch2 :: (Operand -> Operand -> a) -> Operand -> Operand -> a
patch2 insn src dst = (insn $! patchOp src) $! patchOp dst
patch3 :: (Operand -> Reg -> Reg -> a) -> Operand -> Reg -> Reg -> a
patch3 insn src1 src2 dst = ((insn $! patchOp src1) $! env src2) $! env dst
patchOp (OpReg reg) = OpReg $! env reg
patchOp (OpImm imm) = OpImm imm
......
......@@ -838,6 +838,14 @@ pprInstr platform i = case i of
FDIV format op1 op2
-> pprFormatOpOp (text "div") format op1 op2
FMA3 format var perm op1 op2 op3
-> let mnemo = case var of
FMAdd -> text "vfmadd"
FMSub -> text "vfmsub"
FNMAdd -> text "vfnmadd"
FNMSub -> text "vfnmsub"
in pprFormatOpRegReg (mnemo <> pprFMAPermutation perm) format op1 op2 op3
SQRT format op1 op2
-> pprFormatOpReg (text "sqrt") format op1 op2
......@@ -968,6 +976,21 @@ pprInstr platform i = case i of
pprOperand platform format op2
]
pprFormatOpRegReg :: Line doc -> Format -> Operand -> Reg -> Reg -> doc
pprFormatOpRegReg name format op1 op2 op3
= line $ hcat [
pprMnemonic name format,
pprOperand platform format op1,
comma,
pprReg platform format op2,
comma,
pprReg platform format op3
]
pprFMAPermutation :: FMAPermutation -> Line doc
pprFMAPermutation FMA132 = text "132"
pprFMAPermutation FMA213 = text "213"
pprFMAPermutation FMA231 = text "231"
pprOpOp :: Line doc -> Format -> Operand -> Operand -> doc
pprOpOp name format op1 op2
......
......@@ -529,6 +529,11 @@ machOpNeedsCast platform mop args
pprMachOpApp' :: Platform -> MachOp -> [CmmExpr] -> SDoc
pprMachOpApp' platform mop args
= case args of
-- ternary
args@[_,_,_] ->
pprMachOp_for_C platform mop <> parens (pprWithCommas pprArg args)
-- dyadic
[x,y] -> pprArg x <+> pprMachOp_for_C platform mop <+> pprArg y
......@@ -711,13 +716,28 @@ pprMachOp_for_C platform mop = case mop of
MO_U_Quot _ -> char '/'
MO_U_Rem _ -> char '%'
-- & Floating-point operations
-- Floating-point operations
MO_F_Add _ -> char '+'
MO_F_Sub _ -> char '-'
MO_F_Neg _ -> char '-'
MO_F_Mul _ -> char '*'
MO_F_Quot _ -> char '/'
-- Floating-point fused multiply-add operations
MO_FMA FMAdd w ->
case w of
W32 -> text "fmaf"
W64 -> text "fma"
_ ->
pprTrace "offending mop:"
(text "FMAdd")
(panic $ "PprC.pprMachOp_for_C: FMAdd unsupported"
++ "at width " ++ show w)
MO_FMA var _width ->
pprTrace "offending mop:"
(text $ "FMA " ++ show var)
(panic $ "PprC.pprMachOp_for_C: should have been handled earlier!")
-- Signed comparisons
MO_S_Ge _ -> text ">="
MO_S_Le _ -> text "<="
......
......@@ -1469,6 +1469,9 @@ genMachOp _ op [x] = case op of
MO_F_Sub _ -> panicOp
MO_F_Mul _ -> panicOp
MO_F_Quot _ -> panicOp
MO_FMA _ _ -> panicOp
MO_F_Eq _ -> panicOp
MO_F_Ne _ -> panicOp
MO_F_Ge _ -> panicOp
......@@ -1652,6 +1655,8 @@ genMachOp_slow opt op [x, y] = case op of
MO_F_Mul _ -> genBinMach LM_MO_FMul
MO_F_Quot _ -> genBinMach LM_MO_FDiv
MO_FMA _ _ -> panicOp
MO_And _ -> genBinMach LM_MO_And
MO_Or _ -> genBinMach LM_MO_Or
MO_Xor _ -> genBinMach LM_MO_Xor
......@@ -1785,8 +1790,27 @@ genMachOp_slow opt op [x, y] = case op of
panicOp = panic $ "LLVM.CodeGen.genMachOp_slow: unary op encountered"
++ "with two arguments! (" ++ show op ++ ")"
-- More than two expression, invalid!
genMachOp_slow _ _ _ = panic "genMachOp: More than 2 expressions in MachOp!"
genMachOp_slow _opt op [x, y, z] = case op of
MO_FMA var _ -> triLlvmOp getVarType (FMAOp var)
_ -> panicOp
where
triLlvmOp ty op = do
platform <- getPlatform
runExprData $ do
vx <- exprToVarW x
vy <- exprToVarW y
vz <- exprToVarW z
if | getVarType vx == getVarType vy
, getVarType vx == getVarType vz
-> doExprW (ty vx) $ op vx vy vz
| otherwise
-> pprPanic "triLlvmOp types" (pdoc platform x $$ pdoc platform y $$ pdoc platform z)
panicOp = panic $ "LLVM.CodeGen.genMachOp_slow: non-ternary op encountered"
++ "with three arguments! (" ++ show op ++ ")"
-- More than three expressions, invalid!
genMachOp_slow _ _ _ = panic "genMachOp_slow: More than 3 expressions in MachOp!"
-- | Handle CmmLoad expression.
......
......@@ -39,8 +39,8 @@ module GHC.Core.Coercion (
mkSymCo, mkTransCo,
mkSelCo, getNthFun, getNthFromType, mkLRCo,
mkInstCo, mkAppCo, mkAppCos, mkTyConAppCo,
mkFunCo1, mkFunCo2, mkFunCoNoFTF, mkFunResCo,
mkNakedFunCo1, mkNakedFunCo2,
mkFunCo, mkFunCo2, mkFunCoNoFTF, mkFunResCo,
mkNakedFunCo,
mkForAllCo, mkForAllCos, mkHomoForAllCos,
mkPhantomCo,
mkHoleCo, mkUnivCo, mkSubCo,
......@@ -51,7 +51,7 @@ module GHC.Core.Coercion (
castCoercionKind, castCoercionKind1, castCoercionKind2,
mkPrimEqPred, mkReprPrimEqPred, mkPrimEqPredRole,
mkHeteroPrimEqPred, mkHeteroReprPrimEqPred,
mkNomPrimEqPred,
-- ** Decomposition
instNewTyCon_maybe,
......@@ -811,29 +811,20 @@ mkFunCoNoFTF r w arg_co res_co
-- or @(a => x) ~ (b => y)@, depending on the kind of @a@/@b@.
-- This (most common) version takes a single FunTyFlag, which is used
-- for both fco_afl and ftf_afr of the FunCo
mkFunCo1 :: HasDebugCallStack => Role -> FunTyFlag -> CoercionN -> Coercion -> Coercion -> Coercion
mkFunCo1 r af w arg_co res_co
mkFunCo :: Role -> FunTyFlag -> CoercionN -> Coercion -> Coercion -> Coercion
mkFunCo r af w arg_co res_co
= mkFunCo2 r af af w arg_co res_co
mkNakedFunCo1 :: Role -> FunTyFlag -> CoercionN -> Coercion -> Coercion -> Coercion
-- This version of mkFunCo1 does not check FunCo invariants (checkFunCo)
-- It is called during typechecking on un-zonked types;
-- in particular there may be un-zonked coercion variables.
mkNakedFunCo1 r af w arg_co res_co
= mkNakedFunCo2 r af af w arg_co res_co
mkNakedFunCo :: Role -> FunTyFlag -> CoercionN -> Coercion -> Coercion -> Coercion
-- This version of mkFunCo does not check FunCo invariants (checkFunCo)
-- It's a historical vestige; See Note [No assertion check on mkFunCo]
mkNakedFunCo = mkFunCo
mkFunCo2 :: HasDebugCallStack => Role -> FunTyFlag -> FunTyFlag
-> CoercionN -> Coercion -> Coercion -> Coercion
mkFunCo2 :: Role -> FunTyFlag -> FunTyFlag
-> CoercionN -> Coercion -> Coercion -> Coercion
-- This is the smart constructor for FunCo; it checks invariants
mkFunCo2 r afl afr w arg_co res_co
= assertPprMaybe (checkFunCo r afl afr w arg_co res_co) $
mkNakedFunCo2 r afl afr w arg_co res_co
mkNakedFunCo2 :: Role -> FunTyFlag -> FunTyFlag
-> CoercionN -> Coercion -> Coercion -> Coercion
-- This is the smart constructor for FunCo
-- "Naked"; it does not check invariants
mkNakedFunCo2 r afl afr w arg_co res_co
-- See Note [No assertion check on mkFunCo]
| Just (ty1, _) <- isReflCo_maybe arg_co
, Just (ty2, _) <- isReflCo_maybe res_co
, Just (w, _) <- isReflCo_maybe w
......@@ -844,6 +835,19 @@ mkNakedFunCo2 r afl afr w arg_co res_co
, fco_mult = w, fco_arg = arg_co, fco_res = res_co }
{- Note [No assertion check on mkFunCo]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We used to have a checkFunCo assertion on mkFunCo, but during typechecking
we can (legitimately) have not-full-zonked types or coercion variables, so
the assertion spuriously fails (test T11480b is a case in point). Lint
checks all these things anyway.
We used to get around the problem by calling mkNakedFunCo from within the
typechecker, which dodged the assertion check. But then mkAppCo calls
mkTyConAppCo, which calls tyConAppFunCo_maybe, which calls mkFunCo.
Duplicating this stack of calls with "naked" versions of each seems too much.
-- Commented out: see Note [No assertion check on mkFunCo]
checkFunCo :: Role -> FunTyFlag -> FunTyFlag
-> CoercionN -> Coercion -> Coercion
-> Maybe SDoc
......@@ -875,6 +879,7 @@ checkFunCo _r afl afr _w arg_co res_co
ok ty = isTYPEorCONSTRAINT (typeKind ty)
pp_ty str ty = text str <> colon <+> hang (ppr ty)
2 (dcolon <+> ppr (typeKind ty))
-}
-- | Apply a 'Coercion' to another 'Coercion'.
-- The second coercion must be Nominal, unless the first is Phantom.
......@@ -1355,7 +1360,7 @@ mkProofIrrelCo r kco g1 g2 = mkUnivCo (ProofIrrelProv kco) r
-- | Converts a coercion to be nominal, if possible.
-- See Note [Role twiddling functions]
setNominalRole_maybe :: Role -- of input coercion
-> Coercion -> Maybe Coercion
-> Coercion -> Maybe CoercionN
setNominalRole_maybe r co
| r == Nominal = Just co
| otherwise = setNominalRole_maybe_helper co
......@@ -1380,10 +1385,19 @@ setNominalRole_maybe r co
= AppCo <$> setNominalRole_maybe_helper co1 <*> pure co2
setNominalRole_maybe_helper (ForAllCo tv kind_co co)
= ForAllCo tv kind_co <$> setNominalRole_maybe_helper co
setNominalRole_maybe_helper (SelCo n co)
setNominalRole_maybe_helper (SelCo cs co) =
-- NB, this case recurses via setNominalRole_maybe, not
-- setNominalRole_maybe_helper!
= SelCo n <$> setNominalRole_maybe (coercionRole co) co
case cs of
SelTyCon n _r ->
-- Remember to update the role in SelTyCon to nominal;
-- not doing this caused #23362.
-- See the typing rule in Note [SelCo] in GHC.Core.TyCo.Rep.
SelCo (SelTyCon n Nominal) <$> setNominalRole_maybe (coercionRole co) co
SelFun fs ->
SelCo (SelFun fs) <$> setNominalRole_maybe (coercionRole co) co
SelForAll ->
pprPanic "setNominalRole_maybe: the coercion should already be nominal" (ppr co)
setNominalRole_maybe_helper (InstCo co arg)
= InstCo <$> setNominalRole_maybe_helper co <*> pure arg
setNominalRole_maybe_helper (UnivCo prov _ co1 co2)
......@@ -2068,7 +2082,7 @@ ty_co_subst !lc role ty
liftCoSubstTyVar lc r tv
go r (AppTy ty1 ty2) = mkAppCo (go r ty1) (go Nominal ty2)
go r (TyConApp tc tys) = mkTyConAppCo r tc (zipWith go (tyConRoleListX r tc) tys)
go r (FunTy af w t1 t2) = mkFunCo1 r af (go Nominal w) (go r t1) (go r t2)
go r (FunTy af w t1 t2) = mkFunCo r af (go Nominal w) (go r t1) (go r t2)
go r t@(ForAllTy (Bndr v _) ty)
= let (lc', v', h) = liftCoSubstVarBndr lc v
body_co = ty_co_subst lc' r ty in
......@@ -2597,7 +2611,8 @@ mkCoercionType Phantom = \ty1 ty2 ->
in
TyConApp eqPhantPrimTyCon [ki1, ki2, ty1, ty2]
-- | Creates a primitive type equality predicate.
-- | Creates a primitive nominal type equality predicate.
-- t1 ~# t2
-- Invariant: the types are not Coercions
mkPrimEqPred :: Type -> Type -> Type
mkPrimEqPred ty1 ty2
......@@ -2606,22 +2621,9 @@ mkPrimEqPred ty1 ty2
k1 = typeKind ty1
k2 = typeKind ty2
-- | Makes a lifted equality predicate at the given role
mkPrimEqPredRole :: Role -> Type -> Type -> PredType
mkPrimEqPredRole Nominal = mkPrimEqPred
mkPrimEqPredRole Representational = mkReprPrimEqPred
mkPrimEqPredRole Phantom = panic "mkPrimEqPredRole phantom"
-- | Creates a primitive type equality predicate with explicit kinds
mkHeteroPrimEqPred :: Kind -> Kind -> Type -> Type -> Type
mkHeteroPrimEqPred k1 k2 ty1 ty2 = mkTyConApp eqPrimTyCon [k1, k2, ty1, ty2]
-- | Creates a primitive representational type equality predicate
-- with explicit kinds
mkHeteroReprPrimEqPred :: Kind -> Kind -> Type -> Type -> Type
mkHeteroReprPrimEqPred k1 k2 ty1 ty2
= mkTyConApp eqReprPrimTyCon [k1, k2, ty1, ty2]
-- | Creates a primitive representational type equality predicate.
-- t1 ~R# t2
-- Invariant: the types are not Coercions
mkReprPrimEqPred :: Type -> Type -> Type
mkReprPrimEqPred ty1 ty2
= mkTyConApp eqReprPrimTyCon [k1, k2, ty1, ty2]
......@@ -2629,6 +2631,17 @@ mkReprPrimEqPred ty1 ty2
k1 = typeKind ty1
k2 = typeKind ty2
-- | Makes a lifted equality predicate at the given role
mkPrimEqPredRole :: Role -> Type -> Type -> PredType
mkPrimEqPredRole Nominal = mkPrimEqPred
mkPrimEqPredRole Representational = mkReprPrimEqPred
mkPrimEqPredRole Phantom = panic "mkPrimEqPredRole phantom"
-- | Creates a primitive nominal type equality predicate with an explicit
-- (but homogeneous) kind: (~#) k k ty1 ty2
mkNomPrimEqPred :: Kind -> Type -> Type -> Type
mkNomPrimEqPred k ty1 ty2 = mkTyConApp eqPrimTyCon [k, k, ty1, ty2]
-- | Assuming that two types are the same, ignoring coercions, find
-- a nominal coercion between the types. This is useful when optimizing
-- transitivity over coercion applications, where splitting two
......@@ -2659,7 +2672,7 @@ buildCoercion orig_ty1 orig_ty2 = go orig_ty1 orig_ty2
go (FunTy { ft_af = af1, ft_mult = w1, ft_arg = arg1, ft_res = res1 })
(FunTy { ft_af = af2, ft_mult = w2, ft_arg = arg2, ft_res = res2 })
= assert (af1 == af2) $
mkFunCo1 Nominal af1 (go w1 w2) (go arg1 arg2) (go res1 res2)
mkFunCo Nominal af1 (go w1 w2) (go arg1 arg2) (go res1 res2)
go (TyConApp tc1 args1) (TyConApp tc2 args2)
= assert (tc1 == tc2) $
......@@ -2740,15 +2753,17 @@ has_co_hole_co :: Coercion -> Monoid.Any
folder = TyCoFolder { tcf_view = noView
, tcf_tyvar = const2 (Monoid.Any False)
, tcf_covar = const2 (Monoid.Any False)
, tcf_hole = const2 (Monoid.Any True)
, tcf_hole = \_ hole -> Monoid.Any (isHeteroKindCoHole hole)
, tcf_tycobinder = const2
}
-- | Is there a coercion hole in this type?
-- | Is there a hetero-kind coercion hole in this type?
-- (That is, a coercion hole with ch_hetero_kind=True.)
-- See wrinkle (EIK2) of Note [Equalities with incompatible kinds] in GHC.Tc.Solver.Equality
hasCoercionHoleTy :: Type -> Bool
hasCoercionHoleTy = Monoid.getAny . has_co_hole_ty
-- | Is there a coercion hole in this coercion?
-- | Is there a hetero-kind coercion hole in this coercion?
hasCoercionHoleCo :: Coercion -> Bool
hasCoercionHoleCo = Monoid.getAny . has_co_hole_co
......
......@@ -17,9 +17,9 @@ mkReflCo :: Role -> Type -> Coercion
mkTyConAppCo :: HasDebugCallStack => Role -> TyCon -> [Coercion] -> Coercion
mkAppCo :: Coercion -> Coercion -> Coercion
mkForAllCo :: TyCoVar -> Coercion -> Coercion -> Coercion
mkFunCo1 :: HasDebugCallStack => Role -> FunTyFlag -> CoercionN -> Coercion -> Coercion -> Coercion
mkNakedFunCo1 :: Role -> FunTyFlag -> CoercionN -> Coercion -> Coercion -> Coercion
mkFunCo2 :: HasDebugCallStack => Role -> FunTyFlag -> FunTyFlag -> CoercionN -> Coercion -> Coercion -> Coercion
mkFunCo :: Role -> FunTyFlag -> CoercionN -> Coercion -> Coercion -> Coercion
mkNakedFunCo :: Role -> FunTyFlag -> CoercionN -> Coercion -> Coercion -> Coercion
mkFunCo2 :: Role -> FunTyFlag -> FunTyFlag -> CoercionN -> Coercion -> Coercion -> Coercion
mkCoVarCo :: CoVar -> Coercion
mkAxiomInstCo :: CoAxiom Branched -> BranchIndex -> [Coercion] -> Coercion
mkPhantomCo :: Coercion -> Type -> Type -> Coercion
......@@ -36,6 +36,8 @@ mkSubCo :: HasDebugCallStack => Coercion -> Coercion
mkProofIrrelCo :: Role -> Coercion -> Coercion -> Coercion -> Coercion
mkAxiomRuleCo :: CoAxiomRule -> [Coercion] -> Coercion
funRole :: Role -> FunSel -> Role
isGReflCo :: Coercion -> Bool
isReflCo :: Coercion -> Bool
isReflexiveCo :: Coercion -> Bool
......
......@@ -4,7 +4,6 @@
module GHC.Core.Coercion.Opt
( optCoercion
, checkAxInstCo
, OptCoercionOpts (..)
)
where
......@@ -804,37 +803,38 @@ opt_trans_rule is co1 co2
-- Push transitivity inside axioms
opt_trans_rule is co1 co2
-- See Note [Why call checkAxInstCo during optimisation]
-- See Note [Push transitivity inside axioms] and
-- Note [Push transitivity inside newtype axioms only]
-- TrPushSymAxR
| Just (sym, con, ind, cos1) <- co1_is_axiom_maybe
, isNewTyCon (coAxiomTyCon con)
, True <- sym
, Just cos2 <- matchAxiom sym con ind co2
, let newAxInst = AxiomInstCo con ind (opt_transList is (map mkSymCo cos2) cos1)
, Nothing <- checkAxInstCo newAxInst
= fireTransRule "TrPushSymAxR" co1 co2 $ SymCo newAxInst
-- TrPushAxR
| Just (sym, con, ind, cos1) <- co1_is_axiom_maybe
, isNewTyCon (coAxiomTyCon con)
, False <- sym
, Just cos2 <- matchAxiom sym con ind co2
, let newAxInst = AxiomInstCo con ind (opt_transList is cos1 cos2)
, Nothing <- checkAxInstCo newAxInst
= fireTransRule "TrPushAxR" co1 co2 newAxInst
-- TrPushSymAxL
| Just (sym, con, ind, cos2) <- co2_is_axiom_maybe
, isNewTyCon (coAxiomTyCon con)
, True <- sym
, Just cos1 <- matchAxiom (not sym) con ind co1
, let newAxInst = AxiomInstCo con ind (opt_transList is cos2 (map mkSymCo cos1))
, Nothing <- checkAxInstCo newAxInst
= fireTransRule "TrPushSymAxL" co1 co2 $ SymCo newAxInst
-- TrPushAxL
| Just (sym, con, ind, cos2) <- co2_is_axiom_maybe
, isNewTyCon (coAxiomTyCon con)
, False <- sym
, Just cos1 <- matchAxiom (not sym) con ind co1
, let newAxInst = AxiomInstCo con ind (opt_transList is cos1 cos2)
, Nothing <- checkAxInstCo newAxInst
= fireTransRule "TrPushAxL" co1 co2 newAxInst
-- TrPushAxSym/TrPushSymAx
......@@ -915,30 +915,87 @@ fireTransRule _rule _co1 _co2 res
= Just res
{-
Note [Conflict checking with AxiomInstCo]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Consider the following type family and axiom:
Note [Push transitivity inside axioms]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
opt_trans_rule tries to push transitivity inside axioms to deal with cases like
the following:
newtype N a = MkN a
axN :: N a ~R# a
covar :: a ~R# b
co1 = axN <a> :: N a ~R# a
co2 = axN <b> :: N b ~R# b
co :: a ~R# b
co = sym co1 ; N covar ; co2
When we are optimising co, we want to notice that the two axiom instantiations
cancel out. This is implemented by rules such as TrPushSymAxR, which transforms
sym (axN <a>) ; N covar
into
sym (axN covar)
so that TrPushSymAx can subsequently transform
sym (axN covar) ; axN <b>
into
covar
which is much more compact. In some perf test cases this kind of pattern can be
generated repeatedly during simplification, so it is very important we squash it
to stop coercions growing exponentially. For more details see the paper:
Evidence normalisation in System FC
Dimitrios Vytiniotis and Simon Peyton Jones
RTA'13, 2013
https://www.microsoft.com/en-us/research/publication/evidence-normalization-system-fc-2/
Note [Push transitivity inside newtype axioms only]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The optimization described in Note [Push transitivity inside axioms] is possible
for both newtype and type family axioms. However, for type family axioms it is
relatively common to have transitive sequences of axioms instantiations, for
example:
data Nat = Zero | Suc Nat
type family Index (n :: Nat) (xs :: [Type]) :: Type where
Index Zero (x : xs) = x
Index (Suc n) (x : xs) = Index n xs
axIndex :: { forall x::Type. forall xs::[Type]. Index Zero (x : xs) ~ x
; forall n::Nat. forall x::Type. forall xs::[Type]. Index (Suc n) (x : xs) ~ Index n xs }
co :: Index (Suc (Suc Zero)) [a, b, c] ~ c
co = axIndex[1] <Suc Zero> <a> <[b, c]>
; axIndex[1] <Zero> <b> <[c]>
; axIndex[0] <c> <[]>
Not only are there no cancellation opportunities here, but calling matchAxiom
repeatedly down the transitive chain is very expensive. Hence we do not attempt
to push transitivity inside type family axioms. See #8095, !9210 and related tickets.
This is implemented by opt_trans_rule checking that the axiom is for a newtype
constructor (i.e. not a type family). Adding these guards substantially
improved performance (reduced bytes allocated by more than 10%) for the tests
CoOpt_Singletons, LargeRecord, T12227, T12545, T13386, T15703, T5030, T8095.
A side benefit is that we do not risk accidentally creating an ill-typed
coercion; see Note [Why call checkAxInstCo during optimisation].
There may exist programs that previously relied on pushing transitivity inside
type family axioms to avoid creating huge coercions, which will regress in
compile time performance as a result of this change. We do not currently know
of any examples, but if any come to light we may need to reconsider this
behaviour.
type family Equal (a :: k) (b :: k) :: Bool
type instance where
Equal a a = True
Equal a b = False
--
Equal :: forall k::*. k -> k -> Bool
axEqual :: { forall k::*. forall a::k. Equal k a a ~ True
; forall k::*. forall a::k. forall b::k. Equal k a b ~ False }
We wish to disallow (axEqual[1] <*> <Int> <Int). (Recall that the index is
0-based, so this is the second branch of the axiom.) The problem is that, on
the surface, it seems that (axEqual[1] <*> <Int> <Int>) :: (Equal * Int Int ~
False) and that all is OK. But, all is not OK: we want to use the first branch
of the axiom in this case, not the second. The problem is that the parameters
of the first branch can unify with the supplied coercions, thus meaning that
the first branch should be taken. See also Note [Apartness] in
"GHC.Core.FamInstEnv".
Note [Why call checkAxInstCo during optimisation]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NB: The following is no longer relevant, because we no longer push transitivity
into type family axioms (Note [Push transitivity inside newtype axioms only]).
It is retained for reference in case we change this behaviour in the future.
It is possible that otherwise-good-looking optimisations meet with disaster
in the presence of axioms with multiple equations. Consider
......@@ -1029,39 +1086,6 @@ The problem described here was first found in dependent/should_compile/dynamic-p
-}
-- | Check to make sure that an AxInstCo is internally consistent.
-- Returns the conflicting branch, if it exists
-- See Note [Conflict checking with AxiomInstCo]
checkAxInstCo :: Coercion -> Maybe CoAxBranch
-- defined here to avoid dependencies in GHC.Core.Coercion
-- If you edit this function, you may need to update the GHC formalism
-- See Note [GHC Formalism] in GHC.Core.Lint
checkAxInstCo (AxiomInstCo ax ind cos)
= let branch = coAxiomNthBranch ax ind
tvs = coAxBranchTyVars branch
cvs = coAxBranchCoVars branch
incomps = coAxBranchIncomps branch
(tys, cotys) = splitAtList tvs (map coercionLKind cos)
co_args = map stripCoercionTy cotys
subst = zipTvSubst tvs tys `composeTCvSubst`
zipCvSubst cvs co_args
target = Type.substTys subst (coAxBranchLHS branch)
in_scope = mkInScopeSet $
unionVarSets (map (tyCoVarsOfTypes . coAxBranchLHS) incomps)
flattened_target = flattenTys in_scope target in
check_no_conflict flattened_target incomps
where
check_no_conflict :: [Type] -> [CoAxBranch] -> Maybe CoAxBranch
check_no_conflict _ [] = Nothing
check_no_conflict flat (b@CoAxBranch { cab_lhs = lhs_incomp } : rest)
-- See Note [Apartness] in GHC.Core.FamInstEnv
| SurelyApart <- tcUnifyTysFG alwaysBindFun flat lhs_incomp
= check_no_conflict flat rest
| otherwise
= Just b
checkAxInstCo _ = Nothing
-----------
wrapSym :: SymFlag -> Coercion -> Coercion
wrapSym sym co | sym = mkSymCo co
......