Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • ghc/ghc
  • bgamari/ghc
  • syd/ghc
  • ggreif/ghc
  • watashi/ghc
  • RolandSenn/ghc
  • mpickering/ghc
  • DavidEichmann/ghc
  • carter/ghc
  • harpocrates/ghc
  • ethercrow/ghc
  • mijicd/ghc
  • adamse/ghc
  • alexbiehl/ghc
  • gridaphobe/ghc
  • trofi/ghc
  • supersven/ghc
  • ppk/ghc
  • ulysses4ever/ghc
  • AndreasK/ghc
  • ghuntley/ghc
  • shayne-fletcher-da/ghc
  • fgaz/ghc
  • yav/ghc
  • osa1/ghc
  • mbbx6spp/ghc
  • JulianLeviston/ghc
  • reactormonk/ghc
  • rae/ghc
  • takenobu-hs/ghc
  • michalt/ghc
  • andrewthad/ghc
  • hsyl20/ghc
  • scottgw/ghc
  • sjakobi/ghc
  • angerman/ghc
  • RyanGlScott/ghc
  • hvr/ghc
  • howtonotwin/ghc
  • chessai/ghc
  • m-renaud/ghc
  • brprice/ghc
  • stevehartdata/ghc
  • sighingnow/ghc
  • kgardas/ghc
  • ckoparkar/ghc
  • alp/ghc
  • smaeul/ghc
  • kakkun61/ghc
  • sykloid/ghc
  • newhoggy/ghc
  • toonn/ghc
  • nineonine/ghc
  • Phyx/ghc
  • ezyang/ghc
  • tweag/ghc
  • langston/ghc
  • ndmitchell/ghc
  • rockbmb/ghc
  • artempyanykh/ghc
  • mniip/ghc
  • mynguyenbmc/ghc
  • alexfmpe/ghc
  • crockeea/ghc
  • nh2/ghc
  • vaibhavsagar/ghc
  • phadej/ghc
  • Haskell-mouse/ghc
  • lolotp/ghc
  • spacekitteh/ghc
  • michaelpj/ghc
  • mgsloan/ghc
  • HPCohen/ghc
  • tmobile/ghc
  • radrow/ghc
  • simonmar/ghc
  • _deepfire/ghc
  • Ericson2314/ghc
  • leitao/ghc
  • fumieval/ghc
  • trac-isovector/ghc
  • cblp/ghc
  • xich/ghc
  • ciil/ghc
  • erthalion/ghc
  • xldenis/ghc
  • autotaker/ghc
  • haskell-wasm/ghc
  • kcsongor/ghc
  • agander/ghc
  • Baranowski/ghc
  • trac-dredozubov/ghc
  • 23Skidoo/ghc
  • iustin/ghc
  • ningning/ghc
  • josefs/ghc
  • kabuhr/ghc
  • gallais/ghc
  • dten/ghc
  • expipiplus1/ghc
  • Pluralia/ghc
  • rohanjr/ghc
  • intricate/ghc
  • kirelagin/ghc
  • Javran/ghc
  • DanielG/ghc
  • trac-mizunashi_mana/ghc
  • pparkkin/ghc
  • bollu/ghc
  • ntc2/ghc
  • jaspervdj/ghc
  • JoshMeredith/ghc
  • wz1000/ghc
  • zkourouma/ghc
  • code5hot/ghc
  • jdprice/ghc
  • tdammers/ghc
  • J-mie6/ghc
  • trac-lantti/ghc
  • ch1bo/ghc
  • cgohla/ghc
  • lucamolteni/ghc
  • acairncross/ghc
  • amerocu/ghc
  • chreekat/ghc
  • txsmith/ghc
  • trupill/ghc
  • typetetris/ghc
  • sergv/ghc
  • fryguybob/ghc
  • erikd/ghc
  • trac-roland/ghc
  • setupminimal/ghc
  • Friede80/ghc
  • SkyWriter/ghc
  • xplorld/ghc
  • abrar/ghc
  • obsidiansystems/ghc
  • Icelandjack/ghc
  • adinapoli/ghc
  • trac-matthewbauer/ghc
  • heatsink/ghc
  • dwijnand/ghc
  • Cmdv/ghc
  • alinab/ghc
  • pepeiborra/ghc
  • fommil/ghc
  • luochen1990/ghc
  • rlupton20/ghc
  • applePrincess/ghc
  • lehins/ghc
  • ronmrdechai/ghc
  • leeadam/ghc
  • harendra/ghc
  • mightymosquito1991/ghc
  • trac-gershomb/ghc
  • lucajulian/ghc
  • Rizary/ghc
  • VictorCMiraldo/ghc
  • jamesbrock/ghc
  • andrewdmeier/ghc
  • luke/ghc
  • pranaysashank/ghc
  • cocreature/ghc
  • hithroc/ghc
  • obreitwi/ghc
  • slrtbtfs/ghc
  • kaol/ghc
  • yairchu/ghc
  • Mathemagician98/ghc
  • trac-taylorfausak/ghc
  • leungbk/ghc
  • MichaWiedenmann/ghc
  • chris-martin/ghc
  • TDecki/ghc
  • adithyaov/ghc
  • trac-gelisam/ghc
  • Lysxia/ghc
  • complyue/ghc
  • bwignall/ghc
  • sternmull/ghc
  • sonika/ghc
  • leif/ghc
  • broadwaylamb/ghc
  • myszon/ghc
  • danbroooks/ghc
  • Mechachleopteryx/ghc
  • zardyh/ghc
  • trac-vdukhovni/ghc
  • OmarKhaledAbdo/ghc
  • arrowd/ghc
  • Bodigrim/ghc
  • matheus23/ghc
  • cardenaso11/ghc
  • trac-Athas/ghc
  • mb720/ghc
  • DylanZA/ghc
  • liff/ghc
  • typedrat/ghc
  • trac-claude/ghc
  • jbm/ghc
  • Gertjan423/ghc
  • PHO/ghc
  • JKTKops/ghc
  • kockahonza/ghc
  • msakai/ghc
  • Sir4ur0n/ghc
  • barambani/ghc
  • vishnu.c/ghc
  • dcoutts/ghc
  • trac-runeks/ghc
  • trac-MaxGabriel/ghc
  • lexi.lambda/ghc
  • strake/ghc
  • spavikevik/ghc
  • JakobBruenker/ghc
  • rmanne/ghc
  • gdziadkiewicz/ghc
  • ani/ghc
  • iliastsi/ghc
  • smunix/ghc
  • judah/ghc
  • blackgnezdo/ghc
  • emilypi/ghc
  • trac-bpfoley/ghc
  • muesli4/ghc
  • trac-gkaracha/ghc
  • Kleidukos/ghc
  • nek0/ghc
  • TristanCacqueray/ghc
  • dwulive/ghc
  • mbakke/ghc
  • arybczak/ghc
  • Yang123321/ghc
  • maksbotan/ghc
  • QuietMisdreavus/ghc
  • trac-olshanskydr/ghc
  • emekoi/ghc
  • samuela/ghc
  • josephcsible/ghc
  • dramforever/ghc
  • lpsmith/ghc
  • DenisFrezzato/ghc
  • michivi/ghc
  • jneira/ghc
  • jeffhappily/ghc
  • Ivan-Yudin/ghc
  • nakaji-dayo/ghc
  • gdevanla/ghc
  • galen/ghc
  • fendor/ghc
  • yaitskov/ghc
  • rcythr/ghc
  • awpr/ghc
  • jeremyschlatter/ghc
  • Aver1y/ghc
  • mitchellvitez/ghc
  • merijn/ghc
  • tomjaguarpaw1/ghc
  • trac-NoidedSuper/ghc
  • erewok/ghc
  • trac-junji.hashimoto/ghc
  • adamwespiser/ghc
  • bjaress/ghc
  • jhrcek/ghc
  • leonschoorl/ghc
  • lukasz-golebiewski/ghc
  • sheaf/ghc
  • last-g/ghc
  • carassius1014/ghc
  • eschwartz/ghc
  • dwincort/ghc
  • felixwiemuth/ghc
  • TimWSpence/ghc
  • marcusmonteirodesouza/ghc
  • WJWH/ghc
  • vtols/ghc
  • theobat/ghc
  • BinderDavid/ghc
  • ckoparkar0/ghc
  • alexander-kjeldaas/ghc
  • dme2/ghc
  • philderbeast/ghc
  • aaronallen8455/ghc
  • rayshih/ghc
  • benkard/ghc
  • mpardalos/ghc
  • saidelman/ghc
  • leiftw/ghc
  • ca333/ghc
  • bwroga/ghc
  • nmichael44/ghc
  • trac-crobbins/ghc
  • felixonmars/ghc
  • adityagupta1089/ghc
  • hgsipiere/ghc
  • treeowl/ghc
  • alexpeits/ghc
  • CraigFe/ghc
  • dnlkrgr/ghc
  • kerckhove_ts/ghc
  • cptwunderlich/ghc
  • eiais/ghc
  • hahohihu/ghc
  • sanchayan/ghc
  • lemmih/ghc
  • sehqlr/ghc
  • trac-dbeacham/ghc
  • luite/ghc
  • trac-f-a/ghc
  • vados/ghc
  • luntain/ghc
  • fatho/ghc
  • alexbiehl-gc/ghc
  • dcbdan/ghc
  • tvh/ghc
  • liam-ly/ghc
  • timbobbarnes/ghc
  • GovanifY/ghc
  • shanth2600/ghc
  • gliboc/ghc
  • duog/ghc
  • moxonsghost/ghc
  • zander/ghc
  • masaeedu/ghc
  • georgefst/ghc
  • guibou/ghc
  • nicuveo/ghc
  • mdebruijne/ghc
  • stjordanis/ghc
  • emiflake/ghc
  • wygulmage/ghc
  • frasertweedale/ghc
  • coot/ghc
  • aratamizuki/ghc
  • tsandstr/ghc
  • mrBliss/ghc
  • Anton-Latukha/ghc
  • tadfisher/ghc
  • vapourismo/ghc
  • Sorokin-Anton/ghc
  • basile-henry/ghc
  • trac-mightybyte/ghc
  • AbsoluteNikola/ghc
  • cobrien99/ghc
  • songzh/ghc
  • blamario/ghc
  • aj4ayushjain/ghc
  • trac-utdemir/ghc
  • tangcl/ghc
  • hdgarrood/ghc
  • maerwald/ghc
  • arjun/ghc
  • ratherforky/ghc
  • haskieLambda/ghc
  • EmilGedda/ghc
  • Bogicevic/ghc
  • eddiejessup/ghc
  • kozross/ghc
  • AlistairB/ghc
  • 3Rafal/ghc
  • christiaanb/ghc
  • trac-bit/ghc
  • matsumonkie/ghc
  • trac-parsonsmatt/ghc
  • chisui/ghc
  • jaro/ghc
  • trac-kmiyazato/ghc
  • davidsd/ghc
  • Tritlo/ghc
  • I-B-3/ghc
  • lykahb/ghc
  • AriFordsham/ghc
  • turion1/ghc
  • berberman/ghc
  • christiantakle/ghc
  • zyklotomic/ghc
  • trac-ocramz/ghc
  • CSEdd/ghc
  • doyougnu/ghc
  • mmhat/ghc
  • why-not-try-calmer/ghc
  • plutotulp/ghc
  • kjekac/ghc
  • Manvi07/ghc
  • teo/ghc
  • cactus/ghc
  • CarrieMY/ghc
  • abel/ghc
  • yihming/ghc
  • tsakki/ghc
  • jessicah/ghc
  • oliverbunting/ghc
  • meld/ghc
  • friedbrice/ghc
  • Joald/ghc
  • abarbu/ghc
  • DigitalBrains1/ghc
  • sterni/ghc
  • alexDarcy/ghc
  • hexchain/ghc
  • minimario/ghc
  • zliu41/ghc
  • tommd/ghc
  • jazcarate/ghc
  • peterbecich/ghc
  • alirezaghey/ghc
  • solomon/ghc
  • mikael.urankar/ghc
  • davjam/ghc
  • int-index/ghc
  • MorrowM/ghc
  • nrnrnr/ghc
  • Sonfamm/ghc-test-only
  • afzt1/ghc
  • nguyenhaibinh-tpc/ghc
  • trac-lierdakil/ghc
  • MichaWiedenmann1/ghc
  • jmorag/ghc
  • Ziharrk/ghc
  • trac-MitchellSalad/ghc
  • juampe/ghc
  • jwaldmann/ghc
  • snowleopard/ghc
  • juhp/ghc
  • normalcoder/ghc
  • ksqsf/ghc
  • trac-jberryman/ghc
  • roberth/ghc
  • 1ntEgr8/ghc
  • epworth/ghc
  • MrAdityaAlok/ghc
  • JunmingZhao42/ghc
  • jappeace/ghc
  • trac-Gabriel439/ghc
  • alt-romes/ghc
  • HugoPeters1024/ghc
  • 10ne1/ghc-fork
  • agentultra/ghc
  • Garfield1002/ghc
  • ChickenProp/ghc
  • clyring/ghc
  • MaxHearnden/ghc
  • jumper149/ghc
  • vem/ghc
  • ketzacoatl/ghc
  • Rosuavio/ghc
  • jackohughes/ghc
  • p4l1ly/ghc
  • konsumlamm/ghc
  • shlevy/ghc
  • torsten.schmits/ghc
  • andremarianiello/ghc
  • amesgen/ghc
  • googleson78/ghc
  • InfiniteVerma/ghc
  • uhbif19/ghc
  • yiyunliu/ghc
  • raehik/ghc
  • mrkun/ghc
  • telser/ghc
  • 1Jajen1/ghc
  • slotThe/ghc
  • WinstonHartnett/ghc
  • mpilgrem/ghc
  • dreamsmasher/ghc
  • schuelermine/ghc
  • trac-Viwor/ghc
  • undergroundquizscene/ghc
  • evertedsphere/ghc
  • coltenwebb/ghc
  • oberblastmeister/ghc
  • agrue/ghc
  • lf-/ghc
  • zacwood9/ghc
  • steshaw/ghc
  • high-cloud/ghc
  • SkamDart/ghc
  • PiDelport/ghc
  • maoif/ghc
  • RossPaterson/ghc
  • CharlesTaylor7/ghc
  • ribosomerocker/ghc
  • trac-ramirez7/ghc
  • daig/ghc
  • NicolasT/ghc
  • FinleyMcIlwaine/ghc
  • lawtonnichols/ghc
  • jmtd/ghc
  • ozkutuk/ghc
  • wildsebastian/ghc
  • nikshalark/ghc
  • lrzlin/ghc
  • tobias/ghc
  • fw/ghc
  • hawkinsw/ghc
  • type-dance/ghc
  • rui314/ghc
  • ocharles/ghc
  • wavewave/ghc
  • TheKK/ghc
  • nomeata/ghc
  • trac-csabahruska/ghc
  • jonathanjameswatson/ghc
  • L-as/ghc
  • Axman6/ghc
  • barracuda156/ghc
  • trac-jship/ghc
  • jake-87/ghc
  • meooow/ghc
  • rebeccat/ghc
  • hamana55/ghc
  • Enigmage/ghc
  • kokobd/ghc
  • agevelt/ghc
  • gshen42/ghc
  • chrismwendt/ghc
  • MangoIV/ghc
  • teto/ghc
  • Sookr1/ghc
  • trac-thomasjm/ghc
  • barci2/ghc-dev
  • trac-m4dc4p/ghc
  • dixonary/ghc
  • breakerzirconia/ghc
  • alexsio27444/ghc
  • glocq/ghc
  • sourabhxyz/ghc
  • ryantrinkle/ghc
  • Jade/ghc
  • scedfaliako/ghc
  • martijnbastiaan/ghc
  • trac-george.colpitts/ghc
  • ammarbinfaisal/ghc
  • mimi.vx/ghc
  • lortabac/ghc
  • trac-zyla/ghc
  • benbellick/ghc
  • aadaa-fgtaa/ghc
  • jvanbruegge/ghc
  • archbung/ghc
  • gilmi/ghc
  • mfonism/ghc
  • alex-mckenna/ghc
  • Ei30metry/ghc
  • DiegoDiverio/ghc
  • jorgecunhamendes/ghc
  • liesnikov/ghc
  • akrmn/ghc
  • trac-simplifierticks/ghc
  • jacco/ghc
  • rhendric/ghc
  • damhiya/ghc
  • ryndubei/ghc
  • DaveBarton/ghc
  • trac-Profpatsch/ghc
  • GZGavinZhao/ghc
  • ncfavier/ghc
  • jameshaydon/ghc
  • ajccosta/ghc
  • dschrempf/ghc
  • cydparser/ghc
  • LinuxUserGD/ghc
  • elodielander/ghc
  • facundominguez/ghc
  • psilospore/ghc
  • lachrimae/ghc
  • dylan-thinnes/ghc-type-errors-plugin
  • hamishmack/ghc
  • Leary/ghc
  • lzszt/ghc
  • lyokha/ghc
  • trac-glaubitz/ghc
  • Rewbert/ghc
  • andreabedini/ghc
  • Jasagredo/ghc
  • sol/ghc
  • OlegAlexander/ghc
  • trac-sthibaul/ghc
  • avdv/ghc
  • Wendaolee/ghc
  • ur4t/ghc
  • daylily/ghc
  • boltzmannrain/ghc
  • mmzk1526/ghc
  • trac-fizzixnerd/ghc
  • soulomoon/ghc
  • rwmjones/ghc
  • j14i/ghc
  • tracsis/ghc
  • gesh/ghc
  • flip101/ghc
  • eldritch-cookie/ghc
  • LemonjamesD/ghc
  • pgujjula/ghc
  • skeuchel/ghc
  • noteed/ghc
  • gulin.serge/ghc
  • Torrekie/ghc
  • jlwoodwa/ghc
  • ayanamists/ghc
  • husong998/ghc
  • trac-edmundnoble/ghc
  • josephf/ghc
  • contrun/ghc
  • baulig/ghc
  • edsko/ghc
  • mzschr/ghc-issue-24732
  • ulidtko/ghc
  • Arsen/ghc
  • trac-sjoerd_visscher/ghc
  • crumbtoo/ghc
  • L0neGamer/ghc
  • DrewFenwick/ghc
  • benz0li/ghc
  • MaciejWas/ghc
  • jordanrule/ghc
  • trac-qqwy/ghc
  • LiamGoodacre/ghc
  • isomorpheme/ghc
  • trac-danidiaz/ghc
  • Kariim/ghc
  • MTaimoorZaeem/ghc
  • hololeap/ghc
  • ticat-fp/ghc
  • meritamen/ghc
  • criskell/ghc
  • trac-kraai/ghc
  • aergus/ghc
  • jdral/ghc
  • SamB/ghc
  • Tristian/ghc
  • ywgrit/ghc
  • KatsuPatrick/ghc
  • OsePedro/ghc
  • mpscholten/ghc
  • fp/ghc
  • zaquest/ghc
  • fangyi-zhou/ghc
  • augyg/ghc
640 results
Show changes
Commits on Source (28)
  • Serge S. Gulin's avatar
    testsuite: extend size performance tests with gzip (fixes #25046) · eb1cb536
    Serge S. Gulin authored
    The main purpose is to create tests for minimal app (hello world and its variations, i.e. unicode used) distribution size metric.
    
    Many platforms support distribution in compressed form via gzip. It would be nice to collect information on how much size is taken by the executional bundle for each platform at minimal edge case.
    
    2 groups of tests are added:
    1. We extend javascript backend size tests with gzip-enabled versions for all cases where an optimizing compiler is used (for now it is google closure compiler).
    2. We add trivial hello world tests with gzip-enabled versions for all other platforms at CI pipeline where no external optimizing compiler is used.
    eb1cb536
  • Rodrigo Mesquita's avatar
    ghc-internal: @since for backtraceDesired · d94410f8
    Rodrigo Mesquita authored and Marge Bot's avatar Marge Bot committed
    Fixes point 1 in #25052
    d94410f8
  • Rodrigo Mesquita's avatar
    ghc-internal: No trailing whitespace in exceptions · bfe600f5
    Rodrigo Mesquita authored and Marge Bot's avatar Marge Bot committed
    Fixes #25052
    bfe600f5
  • Andreas Klebinger's avatar
    Add since annotation for -fkeep-auto-rules. · 62650d9f
    Andreas Klebinger authored and Marge Bot's avatar Marge Bot committed
    This partially addresses #25082.
    62650d9f
  • Andreas Klebinger's avatar
    Mention `-fkeep-auto-rules` in release notes. · 5f0e23fd
    Andreas Klebinger authored and Marge Bot's avatar Marge Bot committed
    It was added earlier but hadn't appeared in any release notes yet.
    Partially addresses #25082.
    5f0e23fd
  • Sylvain Henry's avatar
    Cmm: don't perform unsound optimizations on 32-bit compiler hosts · 7446a09a
    Sylvain Henry authored and Marge Bot's avatar Marge Bot committed
    
    - beef6135 enabled the use of
      MO_Add/MO_Sub for 64-bit operations in the C and LLVM backends
    - 6755d833 did the same for the x86 NCG
      backend
    
    However we store some literal values as `Int` in the compiler. As a
    result, some Cmm optimizations transformed target 64-bit literals into
    compiler `Int`. If the compiler is 32-bit, this leads to computing with
    wrong literals (see #24893 and #24700).
    
    This patch disables these Cmm optimizations for 32-bit compilers. This
    is unsatisfying (optimizations shouldn't be compiler-word-size
    dependent) but it fixes the bug and it makes the patch easy to backport.
    A proper fix would be much more invasive but it shall be implemented in
    the future.
    
    Co-authored-by: default avataramesgen <amesgen@amesgen.de>
    7446a09a
  • Vladislav Zavialov's avatar
    docs: Update info on RequiredTypeArguments · d59faaf2
    Vladislav Zavialov authored and Marge Bot's avatar Marge Bot committed
    Add a section on "types in terms" that were implemented in 8b2f70a2
    and remove the now outdated suggestion of using `type` for them.
    d59faaf2
  • Sylvain Henry's avatar
    JS: fix minor typo in base's jsbits · 39fd6714
    Sylvain Henry authored and Marge Bot's avatar Marge Bot committed
    39fd6714
  • Sylvain Henry's avatar
    RTS: remove hack to force old cabal to build a library with only JS sources · e7764575
    Sylvain Henry authored and Marge Bot's avatar Marge Bot committed
    Need to extend JSC externs with Emscripten RTS definitions to avoid
    JSC_UNDEFINED_VARIABLE errors when linking without the emcc rts.
    
    Fix #25138
    
    Some recompilation avoidance tests now fail. This is tracked with the
    other instances of this failure in #23013. My hunch is that they were
    working by chance when we used the emcc linker.
    
    Metric Decrease:
        T24602_perf_size
    e7764575
  • Brandon Chinn's avatar
    Support multiline strings in type literals (#25132) · d1a40233
    Brandon Chinn authored and Marge Bot's avatar Marge Bot committed
    d1a40233
  • Sylvain Henry's avatar
    JS: fix callback documentation (#24377) · 610840eb
    Sylvain Henry authored and Marge Bot's avatar Marge Bot committed
    Fix #24377
    610840eb
  • Zubin's avatar
    haddock: Build haddock-api and haddock-library using hadrian · 6ae4b76a
    Zubin authored and Marge Bot's avatar Marge Bot committed
    We build these two packages as regular boot library dependencies rather
    than using the `in-ghc-tree` flag to include the source files into the haddock
    executable.
    
    The `in-ghc-tree` flag is moved into haddock-api to ensure that haddock built
    from hackage can still find the location of the GHC bindist using `ghc-paths`.
    
    Addresses #24834
    
    This causes a metric decrease under non-release flavours because under these
    flavours libraries are compiled with optimisation but executables are not.
    
    Since we move the bulk of the code from the haddock executable to the
    haddock-api library, we see a metric decrease on the validate flavours.
    
    Metric Decrease:
        haddock.Cabal
        haddock.base
        haddock.compiler
    6ae4b76a
  • Arnaud Spiwack's avatar
    Add an extension field to HsRecFields · 51ffba5d
    Arnaud Spiwack authored and Marge Bot's avatar Marge Bot committed
    This is the Right Thing to Do™. And it prepares for storing a
    multiplicity coercion there.
    
    First step of the plan outlined here and below
    !12947 (comment 573091)
    51ffba5d
  • Arnaud Spiwack's avatar
    Add test for #24961 · 4d2faeeb
    Arnaud Spiwack authored and Marge Bot's avatar Marge Bot committed
    4d2faeeb
  • Arnaud Spiwack's avatar
    Ensures that omitted record fields in pattern have multiplicity Many · 623b4337
    Arnaud Spiwack authored and Marge Bot's avatar Marge Bot committed
    Omitted fields were simply ignored in the type checker and produced
    incorrect Core code.
    
    Fixes #24961
    
    Metric Increase:
        RecordUpdPerf
    623b4337
  • Sylvain Henry's avatar
    AARCH64 linker: skip NONE relocations · c749bdfd
    Sylvain Henry authored and Marge Bot's avatar Marge Bot committed
    This patch is part of the patches upstreamed from haskell.nix.
    See https://github.com/input-output-hk/haskell.nix/pull/1960 for the
    original report/patch.
    c749bdfd
  • Brandon Chinn's avatar
    Support multiline strings in TH · 682a6a41
    Brandon Chinn authored and Marge Bot's avatar Marge Bot committed
    682a6a41
  • sheaf's avatar
    The X86 SIMD patch. · f046a759
    sheaf authored
    This commit adds support for 128 bit wide SIMD vectors and vector
    operations to GHC's X86 native code generator.
    
    Main changes:
    
      - Introduction of vector formats (`GHC.CmmToAsm.Format`)
      - Introduction of 128-bit virtual register (`GHC.Platform.Reg`),
        and removal of unused Float virtual register.
      - Refactor of `GHC.Platform.Reg.Class.RegClass`: it now only contains
        two classes, `RcInteger` (for general purpose registers) and `RcFloatOrVector`
        (for registers that can be used for scalar floating point values as well
        as vectors).
      - Modify `GHC.CmmToAsm.X86.Instr.regUsageOfInstr` to keep track
        of which format each register is used at, so that the register
        allocator can know if it needs to spill the entire vector register
        or just the lower 64 bits.
      - Modify spill/load/reg-2-reg code to account for vector registers
        (`GHC.CmmToAsm.X86.Instr.{mkSpillInstr, mkLoadInstr, mkRegRegMoveInstr, takeRegRegMoveInstr}`).
      - Modify the register allocator code (`GHC.CmmToAsm.Reg.*`) to propagate
        the format we are storing in any given register, for instance changing
        `Reg` to `RegFormat` or `GlobalReg` to `GlobalRegUse`.
      - Add logic to lower vector `MachOp`s to X86 assembly
        (see `GHC.CmmToAsm.X86.CodeGen`)
      - Minor cleanups to genprimopcode, to remove the llvm_only attribute
        which is no longer applicable.
    
    Tests for this feature are provided in the "testsuite/tests/simd" directory.
    
    Fixes #7741
    
    Keeping track of register formats adds a small memory overhead to the
    register allocator (in particular, regUsageOfInstr now allocates more
    to keep track of the `Format` each register is used at). This explains
    the following metric increases.
    
    -------------------------
    Metric Increase:
        T12707
        T13035
        T13379
        T3294
        T4801
        T5321FD
        T5321Fun
        T783
    -------------------------
    f046a759
  • sheaf's avatar
    Use xmm registers in genapply · fae71b33
    sheaf authored
    This commit updates genapply to use xmm, ymm and zmm registers, for
    stg_ap_v16/stg_ap_v32/stg_ap_v64, respectively.
    
    It also updates the Cmm lexer and parser to produce Cmm vectors rather
    than 128/256/512 bit wide scalars for V16/V32/V64, removing bits128,
    bits256 and bits512 in favour of vectors.
    
    The Cmm Lint check is weakened for vectors, as (in practice, e.g. on X86)
    it is okay to use a single vector register to hold multiple different
    types of data, and we don't know just from seeing e.g. "XMM1" how to
    interpret the 128 bits of data within.
    
    Fixes #25062
    fae71b33
  • sheaf's avatar
    Add vector fused multiply-add operations · 92b728cf
    sheaf authored
    This commit adds fused multiply add operations such as `fmaddDoubleX2#`.
    These are handled both in the X86 NCG and the LLVM backends.
    92b728cf
  • sheaf's avatar
    Add vector shuffle primops · f3386a59
    sheaf authored
    This adds vector shuffle primops, such as
    
    ```
    shuffleFloatX4# :: FloatX4# -> FloatX4# -> (# Int#, Int#, Int#, Int# #) -> FloatX4#
    ```
    
    which shuffle the components of the input two vectors into the output vector.
    
    NB: the indices must be compile time literals, to match the X86 SHUFPD
    instruction immediate and the LLVM shufflevector instruction.
    
    These are handled in the X86 NCG and the LLVM backend.
    
    Tested in simd009.
    f3386a59
  • sheaf's avatar
    Add Broadcast MachOps · 3b3dfb92
    sheaf authored
    This adds proper MachOps for broadcast instructions, allowing us to
    produce better code for broadcasting a value than simply packing that
    value (doing many vector insertions in a row).
    
    These are lowered in the X86 NCG and LLVM backends. In the LLVM backend,
    it uses the previously introduced shuffle instructions.
    3b3dfb92
  • sheaf's avatar
    Fix treatment of signed zero in vector negation · aa5820fd
    sheaf authored
    This commit fixes the handling of signed zero in floating-point vector
    negation.
    
    A slight hack was introduced to work around the fact that Cmm doesn't
    currently have a notion of signed floating point literals
    (see get_float_broadcast_value_reg). This can be removed once CmmFloat
    can express the value -0.0.
    
    The simd006 test has been updated to use a stricter notion of equality
    of floating-point values, which ensure the validity of this change.
    aa5820fd
  • sheaf's avatar
    Add min/max primops · cec5908c
    sheaf authored
    This commit adds min/max primops, such as
    
      minDouble# :: Double# -> Double# -> Double#
      minFloatX4# :: FloatX4# -> FloatX4# -> FloatX4#
      minWord16X8# :: Word16X8# -> Word16X8# -> Word16X8#
    
    These are supported in:
      - the X86, AArch64 and PowerPC NCGs,
      - the LLVM backend,
      - the WebAssembly and JavaScript backends.
    
    Fixes #25120
    cec5908c
  • sheaf's avatar
    Modularise RegClass · 22176f8b
    sheaf authored
    This commit modularises the RegClass datatype, allowing it to be
    used with architectures that have different register architectures, e.g.
    RISC-V which has separate floating-point and vector registers.
    
    The two modules GHC.Platform.Reg.Class.Unified and
    GHC.Platform.Reg.Class.Separate implement the two register architectures
    we currently support (corresponding to the two constructors of the
    GHC.Platform.Reg.Class.RegArch datatype).
    22176f8b
  • sheaf's avatar
    Add test for C calls & SIMD vectors · 941add96
    sheaf authored
    941add96
  • sheaf's avatar
    Fix C calls with SIMD vectors · 606c72e4
    sheaf authored
    This commit fixes the code generation for C calls, to take into account
    the calling convention.
    
    This is particularly tricky on Windows, where all vectors are expected
    to be passed by reference. See Note [The Windows X64 C calling convention]
    in GHC.CmmToAsm.X86.CodeGen.
    606c72e4
  • sheaf's avatar
    GHC calling convention: clarifications · 7d1a3cc5
    sheaf authored
    This commit clarifies that the GHC calling convention, on X86_64, uses
    xmm1, ..., xmm6 for argument passing. It does not use xmm0, because
    that's the convention we asked the LLVM compiler authors to define for
    usage with GHC.
    
    This unfortunately means a discrepancy with the C calling convention
    (which does use xmm0, for the first argument and for the result).
    
    Fixes #25156
    7d1a3cc5
Showing
with 490 additions and 218 deletions
......@@ -145,7 +145,6 @@ defaults
cheap = { primOpOkForSpeculation _thisOp }
strictness = { \ arity -> mkClosedDmdSig (replicate arity topDmd) topDiv }
fixity = Nothing
llvm_only = False
vector = []
deprecated_msg = {} -- A non-empty message indicates deprecation
......@@ -1094,6 +1093,14 @@ primop DoubleLtOp "<##" Compare Double# -> Double# -> Int#
primop DoubleLeOp "<=##" Compare Double# -> Double# -> Int#
with fixity = infix 4
primop DoubleMinOp "minDouble#" GenPrimOp
Double# -> Double# -> Double#
with commutable = True
primop DoubleMaxOp "maxDouble#" GenPrimOp
Double# -> Double# -> Double#
with commutable = True
primop DoubleAddOp "+##" GenPrimOp
Double# -> Double# -> Double#
with commutable = True
......@@ -1260,6 +1267,14 @@ primop FloatNeOp "neFloat#" Compare
primop FloatLtOp "ltFloat#" Compare Float# -> Float# -> Int#
primop FloatLeOp "leFloat#" Compare Float# -> Float# -> Int#
primop FloatMinOp "minFloat#" GenPrimOp
Float# -> Float# -> Float#
with commutable = True
primop FloatMaxOp "maxFloat#" GenPrimOp
Float# -> Float# -> Float#
with commutable = True
primop FloatAddOp "plusFloat#" GenPrimOp
Float# -> Float# -> Float#
with commutable = True
......@@ -4032,86 +4047,73 @@ section "SIMD Vectors"
,<Word8,Word8#,64>,<Word16,Word16#,32>,<Word32,Word32#,16>,<Word64,Word64#,8>]
primtype VECTOR
with llvm_only = True
vector = ALL_VECTOR_TYPES
with vector = ALL_VECTOR_TYPES
primop VecBroadcastOp "broadcast#" GenPrimOp
SCALAR -> VECTOR
{ Broadcast a scalar to all elements of a vector. }
with llvm_only = True
vector = ALL_VECTOR_TYPES
with vector = ALL_VECTOR_TYPES
primop VecPackOp "pack#" GenPrimOp
VECTUPLE -> VECTOR
{ Pack the elements of an unboxed tuple into a vector. }
with llvm_only = True
vector = ALL_VECTOR_TYPES
with vector = ALL_VECTOR_TYPES
primop VecUnpackOp "unpack#" GenPrimOp
VECTOR -> VECTUPLE
{ Unpack the elements of a vector into an unboxed tuple. #}
with llvm_only = True
vector = ALL_VECTOR_TYPES
with vector = ALL_VECTOR_TYPES
primop VecInsertOp "insert#" GenPrimOp
VECTOR -> SCALAR -> Int# -> VECTOR
{ Insert a scalar at the given position in a vector. }
with effect = CanFail
llvm_only = True
vector = ALL_VECTOR_TYPES
primop VecAddOp "plus#" GenPrimOp
VECTOR -> VECTOR -> VECTOR
{ Add two vectors element-wise. }
with commutable = True
llvm_only = True
vector = ALL_VECTOR_TYPES
primop VecSubOp "minus#" GenPrimOp
VECTOR -> VECTOR -> VECTOR
{ Subtract two vectors element-wise. }
with llvm_only = True
vector = ALL_VECTOR_TYPES
with vector = ALL_VECTOR_TYPES
primop VecMulOp "times#" GenPrimOp
VECTOR -> VECTOR -> VECTOR
{ Multiply two vectors element-wise. }
with commutable = True
llvm_only = True
vector = ALL_VECTOR_TYPES
primop VecDivOp "divide#" GenPrimOp
VECTOR -> VECTOR -> VECTOR
{ Divide two vectors element-wise. }
with effect = CanFail
llvm_only = True
vector = FLOAT_VECTOR_TYPES
primop VecQuotOp "quot#" GenPrimOp
VECTOR -> VECTOR -> VECTOR
{ Rounds towards zero element-wise. }
with effect = CanFail
llvm_only = True
vector = INT_VECTOR_TYPES
primop VecRemOp "rem#" GenPrimOp
VECTOR -> VECTOR -> VECTOR
{ Satisfies @('quot#' x y) 'times#' y 'plus#' ('rem#' x y) == x@. }
with effect = CanFail
llvm_only = True
vector = INT_VECTOR_TYPES
primop VecNegOp "negate#" GenPrimOp
VECTOR -> VECTOR
{ Negate element-wise. }
with llvm_only = True
vector = SIGNED_VECTOR_TYPES
with vector = SIGNED_VECTOR_TYPES
primop VecIndexByteArrayOp "indexArray#" GenPrimOp
ByteArray# -> Int# -> VECTOR
{ Read a vector from specified index of immutable array. }
with effect = CanFail
llvm_only = True
vector = ALL_VECTOR_TYPES
primop VecReadByteArrayOp "readArray#" GenPrimOp
......@@ -4119,7 +4121,6 @@ primop VecReadByteArrayOp "readArray#" GenPrimOp
{ Read a vector from specified index of mutable array. }
with effect = ReadWriteEffect
can_fail_warning = YesWarnCanFail
llvm_only = True
vector = ALL_VECTOR_TYPES
primop VecWriteByteArrayOp "writeArray#" GenPrimOp
......@@ -4127,14 +4128,12 @@ primop VecWriteByteArrayOp "writeArray#" GenPrimOp
{ Write a vector to specified index of mutable array. }
with effect = ReadWriteEffect
can_fail_warning = YesWarnCanFail
llvm_only = True
vector = ALL_VECTOR_TYPES
primop VecIndexOffAddrOp "indexOffAddr#" GenPrimOp
Addr# -> Int# -> VECTOR
{ Reads vector; offset in bytes. }
with effect = CanFail
llvm_only = True
vector = ALL_VECTOR_TYPES
primop VecReadOffAddrOp "readOffAddr#" GenPrimOp
......@@ -4142,7 +4141,6 @@ primop VecReadOffAddrOp "readOffAddr#" GenPrimOp
{ Reads vector; offset in bytes. }
with effect = ReadWriteEffect
can_fail_warning = YesWarnCanFail
llvm_only = True
vector = ALL_VECTOR_TYPES
primop VecWriteOffAddrOp "writeOffAddr#" GenPrimOp
......@@ -4150,7 +4148,6 @@ primop VecWriteOffAddrOp "writeOffAddr#" GenPrimOp
{ Write vector; offset in bytes. }
with effect = ReadWriteEffect
can_fail_warning = YesWarnCanFail
llvm_only = True
vector = ALL_VECTOR_TYPES
......@@ -4158,7 +4155,6 @@ primop VecIndexScalarByteArrayOp "indexArrayAs#" GenPrimOp
ByteArray# -> Int# -> VECTOR
{ Read a vector from specified index of immutable array of scalars; offset is in scalar elements. }
with effect = CanFail
llvm_only = True
vector = ALL_VECTOR_TYPES
primop VecReadScalarByteArrayOp "readArrayAs#" GenPrimOp
......@@ -4166,7 +4162,6 @@ primop VecReadScalarByteArrayOp "readArrayAs#" GenPrimOp
{ Read a vector from specified index of mutable array of scalars; offset is in scalar elements. }
with effect = ReadWriteEffect
can_fail_warning = YesWarnCanFail
llvm_only = True
vector = ALL_VECTOR_TYPES
primop VecWriteScalarByteArrayOp "writeArrayAs#" GenPrimOp
......@@ -4174,14 +4169,12 @@ primop VecWriteScalarByteArrayOp "writeArrayAs#" GenPrimOp
{ Write a vector to specified index of mutable array of scalars; offset is in scalar elements. }
with effect = ReadWriteEffect
can_fail_warning = YesWarnCanFail
llvm_only = True
vector = ALL_VECTOR_TYPES
primop VecIndexScalarOffAddrOp "indexOffAddrAs#" GenPrimOp
Addr# -> Int# -> VECTOR
{ Reads vector; offset in scalar elements. }
with effect = CanFail
llvm_only = True
vector = ALL_VECTOR_TYPES
primop VecReadScalarOffAddrOp "readOffAddrAs#" GenPrimOp
......@@ -4189,7 +4182,6 @@ primop VecReadScalarOffAddrOp "readOffAddrAs#" GenPrimOp
{ Reads vector; offset in scalar elements. }
with effect = ReadWriteEffect
can_fail_warning = YesWarnCanFail
llvm_only = True
vector = ALL_VECTOR_TYPES
primop VecWriteScalarOffAddrOp "writeOffAddrAs#" GenPrimOp
......@@ -4197,9 +4189,47 @@ primop VecWriteScalarOffAddrOp "writeOffAddrAs#" GenPrimOp
{ Write vector; offset in scalar elements. }
with effect = ReadWriteEffect
can_fail_warning = YesWarnCanFail
llvm_only = True
vector = ALL_VECTOR_TYPES
primop VecFMAdd "fmadd#" GenPrimOp
VECTOR -> VECTOR -> VECTOR -> VECTOR
{Fused multiply-add operation @x*y+z@. See "GHC.Prim#fma".}
with
vector = FLOAT_VECTOR_TYPES
primop VecFMSub "fmsub#" GenPrimOp
VECTOR -> VECTOR -> VECTOR -> VECTOR
{Fused multiply-subtract operation @x*y-z@. See "GHC.Prim#fma".}
with
vector = FLOAT_VECTOR_TYPES
primop VecFNMAdd "fnmadd#" GenPrimOp
VECTOR -> VECTOR -> VECTOR -> VECTOR
{Fused negate-multiply-add operation @-x*y+z@. See "GHC.Prim#fma".}
with
vector = FLOAT_VECTOR_TYPES
primop VecFNMSub "fnmsub#" GenPrimOp
VECTOR -> VECTOR -> VECTOR -> VECTOR
{Fused negate-multiply-subtract operation @-x*y-z@. See "GHC.Prim#fma".}
with
vector = FLOAT_VECTOR_TYPES
primop VecShuffleOp "shuffle#" GenPrimOp
VECTOR -> VECTOR -> INTVECTUPLE -> VECTOR
{Shuffle elements of the concatenation of the input two vectors
into the result vector.}
with vector = ALL_VECTOR_TYPES
primop VecMinOp "min#" GenPrimOp
VECTOR -> VECTOR -> VECTOR
{Component-wise minimum of two vectors.}
with
vector = ALL_VECTOR_TYPES
primop VecMaxOp "max#" GenPrimOp
VECTOR -> VECTOR -> VECTOR
{Component-wise maximum of two vectors.}
with
vector = ALL_VECTOR_TYPES
------------------------------------------------------------------------
section "Prefetch"
......
......@@ -664,7 +664,8 @@ mkNativeCallInfoSig platform NativeCallInfo{..}
| otherwise
= assertPpr (length regs <= 24) (text "too many registers for bitmap:" <+> ppr (length regs)) {- 24 bits for register bitmap -}
assertPpr (cont_offset < 255) (text "continuation offset too large:" <+> ppr cont_offset) {- 8 bits for continuation offset (only for NativeTupleReturn) -}
assertPpr (all (`elem` regs) (regSetToList nativeCallRegs)) (text "not all registers accounted for") {- all regs accounted for -}
assertPpr (all (`elem` (map globalRegUseGlobalReg regs)) (regSetToList nativeCallRegs)) (text "not all registers accounted for") {- all regs accounted for -}
-- SLD: the above assertion seems wrong, because it doesn't account for register overlap
foldl' reg_bit 0 (zip regs [0..]) .|. (cont_offset `shiftL` 24)
where
cont_offset :: Word32
......@@ -672,8 +673,8 @@ mkNativeCallInfoSig platform NativeCallInfo{..}
| nativeCallType == NativeTupleReturn = fromIntegral nativeCallStackSpillSize
| otherwise = 0 -- there is no continuation for primcalls
reg_bit :: Word32 -> (GlobalReg, Int) -> Word32
reg_bit x (r, n)
reg_bit :: Word32 -> (GlobalRegUse, Int) -> Word32
reg_bit x (GlobalRegUse r _, n)
| r `elemRegSet` nativeCallRegs = x .|. 1 `shiftL` n
| otherwise = x
regs = allArgRegsCover platform
......
......@@ -100,7 +100,7 @@ data GenCmmDecl d h g
= CmmProc -- A procedure
h -- Extra header such as the info table
CLabel -- Entry label
[GlobalReg] -- Registers live on entry. Note that the set of live
[GlobalRegUse] -- Registers live on entry. Note that the set of live
-- registers will be correct in generated C-- code, but
-- not in hand-written C-- code. However,
-- splitAtProcPoints calculates correct liveness
......
......@@ -7,7 +7,6 @@ module GHC.Cmm.CallConv (
) where
import GHC.Prelude
import Data.List (nub)
import GHC.Cmm.Expr
import GHC.Runtime.Heap.Layout
......@@ -17,6 +16,8 @@ import GHC.Platform
import GHC.Platform.Profile
import GHC.Utils.Outputable
import GHC.Utils.Panic
import GHC.Data.List.SetOps (nubOrdBy)
import Data.Ord (comparing)
-- Calculate the 'GlobalReg' or stack locations for function call
-- parameters as used by the Cmm calling convention.
......@@ -67,14 +68,16 @@ assignArgumentsPos profile off conv arg_ty reps = (stk_off, assignments)
assign_regs assts (r:rs) regs | isVecType ty = vec
| isFloatType ty = float
| otherwise = int
where vec = case (w, regs) of
(W128, AvailRegs vs fs ds ls (s:ss))
| passVectorInReg W128 profile -> k (RegisterParam (XmmReg s), AvailRegs vs fs ds ls ss)
(W256, AvailRegs vs fs ds ls (s:ss))
| passVectorInReg W256 profile -> k (RegisterParam (YmmReg s), AvailRegs vs fs ds ls ss)
(W512, AvailRegs vs fs ds ls (s:ss))
| passVectorInReg W512 profile -> k (RegisterParam (ZmmReg s), AvailRegs vs fs ds ls ss)
_ -> (assts, (r:rs))
where vec = case regs of
AvailRegs vs fs ds ls (s:ss)
| passVectorInReg w profile
-> let reg_class = case w of
W128 -> XmmReg
W256 -> YmmReg
W512 -> ZmmReg
_ -> panic "CmmCallConv.assignArgumentsPos: Invalid vector width"
in k (RegisterParam (reg_class s), AvailRegs vs fs ds ls ss)
_ -> (assts, r:rs)
float = case (w, regs) of
(W32, AvailRegs vs fs ds ls (s:ss))
| passFloatInXmm -> k (RegisterParam (FloatReg s), AvailRegs vs fs ds ls ss)
......@@ -213,28 +216,26 @@ allRegs platform =
nodeOnly :: AvailRegs
nodeOnly = noAvailRegs { availVanillaRegs = [VanillaReg 1] }
-- This returns the set of global registers that *cover* the machine registers
-- used for argument passing. On platforms where registers can overlap---right
-- now just x86-64, where Float and Double registers overlap---passing this set
-- of registers is guaranteed to preserve the contents of all live registers. We
-- only use this functionality in hand-written C-- code in the RTS.
realArgRegsCover :: Platform -> [GlobalReg]
-- | This returns the set of global registers that *cover* the machine registers
-- used for argument passing. On platforms where registers can overlap, passing
-- this set of registers is guaranteed to preserve the contents of all live
-- registers. We only use this functionality in hand-written C-- code in the RTS.
realArgRegsCover :: Platform -> [GlobalRegUse]
realArgRegsCover platform
| passFloatArgsInXmm platform
= realVanillaRegs platform ++
realLongRegs platform ++
realDoubleRegs platform
-- we only need to save the low Double part of XMM registers.
-- Moreover, the NCG can't load/store full XMM
-- registers for now...
= [ GlobalRegUse r (globalRegSpillType platform r) | r <- realVanillaRegs platform ]
++ [ GlobalRegUse r (globalRegSpillType platform r) | r <- realLongRegs platform ]
++ [ GlobalRegUse r (globalRegSpillType platform r) | r <- realDoubleRegs platform ]
-- The above seems wrong, as it means we only save the low 64 bits
-- of XMM/YMM/ZMM registers on X86_64, which is probably wrong.
--
-- Challenge: change the realDoubleRegs line to use ZmmReg instead,
-- and fix the resulting compiler errors.
| otherwise
= realVanillaRegs platform ++
realFloatRegs platform ++
realDoubleRegs platform ++
realLongRegs platform
-- we don't save XMM registers if they are not used for parameter passing
= [ GlobalRegUse r (globalRegSpillType platform r)
| r <- realVanillaRegs platform ++ realFloatRegs platform ++ realDoubleRegs platform ++ realLongRegs platform
] -- we don't save XMM registers if they are not used for parameter passing
{-
......@@ -335,9 +336,11 @@ realArgRegsCover platform
make sure to also update GHC.StgToByteCode.layoutNativeCall
-}
-- Like realArgRegsCover but always includes the node. This covers all real
-- | Like 'realArgRegsCover' but always includes the node. This covers all real
-- and virtual registers actually used for passing arguments.
allArgRegsCover :: Platform -> [GlobalReg]
allArgRegsCover :: Platform -> [GlobalRegUse]
allArgRegsCover platform =
nub (VanillaReg 1 : realArgRegsCover platform)
nubOrdBy (comparing globalRegUseGlobalReg)
(GlobalRegUse node (globalRegSpillType platform node) : realArgRegsCover platform)
where
node = VanillaReg 1
......@@ -208,7 +208,7 @@ mkJump profile conv e actuals updfr_off =
-- | A jump where the caller says what the live GlobalRegs are. Used
-- for low-level hand-written Cmm.
mkRawJump :: Profile -> CmmExpr -> UpdFrameOffset -> [GlobalReg]
mkRawJump :: Profile -> CmmExpr -> UpdFrameOffset -> [GlobalRegUse]
-> CmmAGraph
mkRawJump profile e updfr_off vols =
lastWithArgs profile Jump Old NativeNodeCall [] updfr_off $
......@@ -297,7 +297,7 @@ stackStubExpr w = CmmLit (CmmInt 0 w)
copyInOflow :: Profile -> Convention -> Area
-> [CmmFormal]
-> [CmmFormal]
-> (Int, [GlobalReg], CmmAGraph)
-> (Int, [GlobalRegUse], CmmAGraph)
copyInOflow profile conv area formals extra_stk
= (offset, gregs, catAGraphs $ map mkMiddle nodes)
......@@ -308,9 +308,9 @@ copyInOflow profile conv area formals extra_stk
copyIn :: Profile -> Convention -> Area
-> [CmmFormal]
-> [CmmFormal]
-> (ByteOff, [GlobalReg], [CmmNode O O])
-> (ByteOff, [GlobalRegUse], [CmmNode O O])
copyIn profile conv area formals extra_stk
= (stk_size, [r | (_, RegisterParam r) <- args], map ci (stk_args ++ args))
= (stk_size, [GlobalRegUse r (localRegType lr)| (lr, RegisterParam r) <- args], map ci (stk_args ++ args))
where
platform = profilePlatform profile
......@@ -365,7 +365,7 @@ data Transfer = Call | JumpRet | Jump | Ret deriving Eq
copyOutOflow :: Profile -> Convention -> Transfer -> Area -> [CmmExpr]
-> UpdFrameOffset
-> [CmmExpr] -- extra stack args
-> (Int, [GlobalReg], CmmAGraph)
-> (Int, [GlobalRegUse], CmmAGraph)
-- Generate code to move the actual parameters into the locations
-- required by the calling convention. This includes a store for the
......@@ -383,8 +383,8 @@ copyOutOflow profile conv transfer area actuals updfr_off extra_stack_stuff
(regs, graph) = foldr co ([], mkNop) (setRA ++ args ++ stack_params)
co :: (CmmExpr, ParamLocation)
-> ([GlobalReg], CmmAGraph)
-> ([GlobalReg], CmmAGraph)
-> ([GlobalRegUse], CmmAGraph)
-> ([GlobalRegUse], CmmAGraph)
co (v, RegisterParam r@(VanillaReg {})) (rs, ms) =
let width = cmmExprWidth platform v
value
......@@ -393,12 +393,14 @@ copyOutOflow profile conv transfer area actuals updfr_off extra_stack_stuff
| width < wordWidth platform =
CmmMachOp (MO_XX_Conv width (wordWidth platform)) [v]
| otherwise = panic "Parameter width greater than word width"
ru = GlobalRegUse r (cmmExprType platform value)
in (r:rs, mkAssign (CmmGlobal $ GlobalRegUse r (cmmExprType platform value)) value <*> ms)
in (ru:rs, mkAssign (CmmGlobal ru) value <*> ms)
-- Non VanillaRegs
co (v, RegisterParam r) (rs, ms) =
(r:rs, mkAssign (CmmGlobal $ GlobalRegUse r (cmmExprType platform v)) v <*> ms)
let ru = GlobalRegUse r (cmmExprType platform v)
in (ru:rs, mkAssign (CmmGlobal ru) v <*> ms)
co (v, StackParam off) (rs, ms)
= (rs, mkStore (CmmStackSlot area off) (value v) <*> ms)
......@@ -461,13 +463,13 @@ copyOutOflow profile conv transfer area actuals updfr_off extra_stack_stuff
mkCallEntry :: Profile -> Convention -> [CmmFormal] -> [CmmFormal]
-> (Int, [GlobalReg], CmmAGraph)
-> (Int, [GlobalRegUse], CmmAGraph)
mkCallEntry profile conv formals extra_stk
= copyInOflow profile conv Old formals extra_stk
lastWithArgs :: Profile -> Transfer -> Area -> Convention -> [CmmExpr]
-> UpdFrameOffset
-> (ByteOff -> [GlobalReg] -> CmmAGraph)
-> (ByteOff -> [GlobalRegUse] -> CmmAGraph)
-> CmmAGraph
lastWithArgs profile transfer area conv actuals updfr_off last =
lastWithArgsAndExtraStack profile transfer area conv actuals
......@@ -476,7 +478,7 @@ lastWithArgs profile transfer area conv actuals updfr_off last =
lastWithArgsAndExtraStack :: Profile
-> Transfer -> Area -> Convention -> [CmmExpr]
-> UpdFrameOffset -> [CmmExpr]
-> (ByteOff -> [GlobalReg] -> CmmAGraph)
-> (ByteOff -> [GlobalRegUse] -> CmmAGraph)
-> CmmAGraph
lastWithArgsAndExtraStack profile transfer area conv actuals updfr_off
extra_stack last =
......@@ -490,7 +492,7 @@ noExtraStack :: [CmmExpr]
noExtraStack = []
toCall :: CmmExpr -> Maybe BlockId -> UpdFrameOffset -> ByteOff
-> ByteOff -> [GlobalReg]
-> ByteOff -> [GlobalRegUse]
-> CmmAGraph
toCall e cont updfr_off res_space arg_space regs =
mkLast $ CmmCall e cont regs arg_space res_space updfr_off
......@@ -104,11 +104,14 @@ $white_no_nl+ ;
"False" { kw CmmT_False }
"likely" { kw CmmT_likely}
P@decimal { global_regN VanillaReg gcWord }
R@decimal { global_regN VanillaReg bWord }
F@decimal { global_regN FloatReg (const $ cmmFloat W32) }
D@decimal { global_regN DoubleReg (const $ cmmFloat W64) }
L@decimal { global_regN LongReg (const $ cmmBits W64) }
P@decimal { global_regN 1 VanillaReg gcWord }
R@decimal { global_regN 1 VanillaReg bWord }
F@decimal { global_regN 1 FloatReg (const $ cmmFloat W32) }
D@decimal { global_regN 1 DoubleReg (const $ cmmFloat W64) }
L@decimal { global_regN 1 LongReg (const $ cmmBits W64) }
XMM@decimal { global_regN 3 XmmReg (const $ cmmVec 2 (cmmFloat W64)) }
YMM@decimal { global_regN 3 YmmReg (const $ cmmVec 4 (cmmFloat W64)) }
ZMM@decimal { global_regN 3 ZmmReg (const $ cmmVec 8 (cmmFloat W64)) }
Sp { global_reg Sp bWord }
SpLim { global_reg SpLim bWord }
Hp { global_reg Hp gcWord }
......@@ -173,9 +176,9 @@ data CmmToken
| CmmT_bits16
| CmmT_bits32
| CmmT_bits64
| CmmT_bits128
| CmmT_bits256
| CmmT_bits512
| CmmT_vec128
| CmmT_vec256
| CmmT_vec512
| CmmT_float32
| CmmT_float64
| CmmT_gcptr
......@@ -211,14 +214,16 @@ special_char span buf _len = return (L span (CmmT_SpecChar (currentChar buf)))
kw :: CmmToken -> Action
kw tok span _buf _len = return (L span tok)
global_regN :: (Int -> GlobalReg) -> (Platform -> CmmType) -> Action
global_regN con ty_fn span buf len
global_regN :: Int -> (Int -> GlobalReg) -> (Platform -> CmmType) -> Action
global_regN ident_nb_chars con ty_fn span buf len
= do { platform <- getPlatform
; let reg = con (fromIntegral n)
ty = ty_fn platform
; return (L span (CmmT_GlobalReg (GlobalRegUse reg ty))) }
where buf' = stepOn buf
n = parseUnsignedInteger buf' (len-1) 10 octDecDigit
where buf' = go ident_nb_chars buf
where go 0 b = b
go i b = go (i-1) (stepOn b)
n = parseUnsignedInteger buf' (len-ident_nb_chars) 10 octDecDigit
global_reg :: GlobalReg -> (Platform -> CmmType) -> Action
global_reg reg ty_fn span _buf _len
......@@ -269,9 +274,9 @@ reservedWordsFM = listToUFM $
( "bits16", CmmT_bits16 ),
( "bits32", CmmT_bits32 ),
( "bits64", CmmT_bits64 ),
( "bits128", CmmT_bits128 ),
( "bits256", CmmT_bits256 ),
( "bits512", CmmT_bits512 ),
( "vec128", CmmT_vec128 ),
( "vec256", CmmT_vec256 ),
( "vec512", CmmT_vec512 ),
( "float32", CmmT_float32 ),
( "float64", CmmT_float64 ),
-- New forms
......@@ -279,9 +284,6 @@ reservedWordsFM = listToUFM $
( "b16", CmmT_bits16 ),
( "b32", CmmT_bits32 ),
( "b64", CmmT_bits64 ),
( "b128", CmmT_bits128 ),
( "b256", CmmT_bits256 ),
( "b512", CmmT_bits512 ),
( "f32", CmmT_float32 ),
( "f64", CmmT_float64 ),
( "gcptr", CmmT_gcptr ),
......
......@@ -171,7 +171,7 @@ lintCmmMiddle node = case node of
CmmAssign reg expr -> do
erep <- lintCmmExpr expr
let reg_ty = cmmRegType reg
unless (erep `cmmEqType_ignoring_ptrhood` reg_ty) $
unless (erep `cmmCompatType` reg_ty) $
cmmLintAssignErr (CmmAssign reg expr) erep reg_ty
CmmStore l r _alignment -> do
......
......@@ -59,7 +59,7 @@ cmmLocalLiveness platform graph =
check facts =
noLiveOnEntry entry (expectJust "check" $ mapLookup entry facts) facts
cmmGlobalLiveness :: Platform -> CmmGraph -> BlockEntryLiveness GlobalReg
cmmGlobalLiveness :: Platform -> CmmGraph -> BlockEntryLiveness GlobalRegUse
cmmGlobalLiveness platform graph =
analyzeCmmBwd liveLattice (xferLive platform) graph mapEmpty
......@@ -92,7 +92,7 @@ xferLive platform (BlockCC eNode middle xNode) fBase =
!result = foldNodesBwdOO (gen_kill platform) middle joined
in mapSingleton (entryLabel eNode) result
{-# SPECIALIZE xferLive :: Platform -> TransferFun (CmmLive LocalReg) #-}
{-# SPECIALIZE xferLive :: Platform -> TransferFun (CmmLive GlobalReg) #-}
{-# SPECIALIZE xferLive :: Platform -> TransferFun (CmmLive GlobalRegUse) #-}
-----------------------------------------------------------------------------
-- | Specialization that only retains the keys for local variables.
......
......@@ -116,7 +116,7 @@ data MachOp
-- Floating-point fused multiply-add operations
-- | Fused multiply-add, see 'FMASign'.
| MO_FMA FMASign Width
| MO_FMA FMASign Length Width
-- Floating point comparison
| MO_F_Eq Width
......@@ -126,6 +126,9 @@ data MachOp
| MO_F_Gt Width
| MO_F_Lt Width
| MO_F_Min Width
| MO_F_Max Width
-- Bitwise operations. Not all of these may be supported
-- at all sizes, and only integral Widths are valid.
| MO_And Width
......@@ -158,8 +161,9 @@ data MachOp
| MO_FW_Bitcast Width -- Float/Double -> Word32/Word64
-- Vector element insertion and extraction operations
| MO_V_Insert Length Width -- Insert scalar into vector
| MO_V_Extract Length Width -- Extract scalar from vector
| MO_V_Broadcast Length Width -- Broadcast a scalar into a vector
| MO_V_Insert Length Width -- Insert scalar into vector
| MO_V_Extract Length Width -- Extract scalar from vector
-- Integer vector operations
| MO_V_Add Length Width
......@@ -175,9 +179,14 @@ data MachOp
| MO_VU_Quot Length Width
| MO_VU_Rem Length Width
-- Vector shuffles
| MO_V_Shuffle Length Width [Int]
| MO_VF_Shuffle Length Width [Int]
-- Floating point vector element insertion and extraction operations
| MO_VF_Insert Length Width -- Insert scalar into vector
| MO_VF_Extract Length Width -- Extract scalar from vector
| MO_VF_Broadcast Length Width -- Broadcast a scalar into a vector
| MO_VF_Insert Length Width -- Insert scalar into vector
| MO_VF_Extract Length Width -- Extract scalar from vector
-- Floating point vector operations
| MO_VF_Add Length Width
......@@ -186,6 +195,14 @@ data MachOp
| MO_VF_Mul Length Width
| MO_VF_Quot Length Width
-- Min/max operations
| MO_VS_Min Length Width
| MO_VS_Max Length Width
| MO_VU_Min Length Width
| MO_VU_Max Length Width
| MO_VF_Min Length Width
| MO_VF_Max Length Width
-- | An atomic read with no memory ordering. Address msut
-- be naturally aligned.
| MO_RelaxedRead Width
......@@ -316,6 +333,8 @@ isCommutableMachOp mop =
MO_Xor _ -> True
MO_F_Add _ -> True
MO_F_Mul _ -> True
MO_F_Min {} -> True
MO_F_Max {} -> True
_other -> False
-- ----------------------------------------------------------------------------
......@@ -458,8 +477,10 @@ machOpResultType platform mop tys =
MO_F_Mul r -> cmmFloat r
MO_F_Quot r -> cmmFloat r
MO_F_Neg r -> cmmFloat r
MO_F_Min r -> cmmFloat r
MO_F_Max r -> cmmFloat r
MO_FMA _ r -> cmmFloat r
MO_FMA _ l r -> if l == 1 then cmmFloat r else cmmVec l (cmmFloat r)
MO_F_Eq {} -> comparisonResultRep platform
MO_F_Ne {} -> comparisonResultRep platform
......@@ -485,6 +506,7 @@ machOpResultType platform mop tys =
MO_WF_Bitcast w -> cmmFloat w
MO_FW_Bitcast w -> cmmBits w
MO_V_Broadcast l w -> cmmVec l (cmmBits w)
MO_V_Insert l w -> cmmVec l (cmmBits w)
MO_V_Extract _ w -> cmmBits w
......@@ -495,10 +517,18 @@ machOpResultType platform mop tys =
MO_VS_Quot l w -> cmmVec l (cmmBits w)
MO_VS_Rem l w -> cmmVec l (cmmBits w)
MO_VS_Neg l w -> cmmVec l (cmmBits w)
MO_VS_Min l w -> cmmVec l (cmmBits w)
MO_VS_Max l w -> cmmVec l (cmmBits w)
MO_VU_Quot l w -> cmmVec l (cmmBits w)
MO_VU_Rem l w -> cmmVec l (cmmBits w)
MO_VU_Min l w -> cmmVec l (cmmBits w)
MO_VU_Max l w -> cmmVec l (cmmBits w)
MO_V_Shuffle l w _ -> cmmVec l (cmmBits w)
MO_VF_Shuffle l w _ -> cmmVec l (cmmFloat w)
MO_VF_Broadcast l w -> cmmVec l (cmmFloat w)
MO_VF_Insert l w -> cmmVec l (cmmFloat w)
MO_VF_Extract _ w -> cmmFloat w
......@@ -507,6 +537,8 @@ machOpResultType platform mop tys =
MO_VF_Mul l w -> cmmVec l (cmmFloat w)
MO_VF_Quot l w -> cmmVec l (cmmFloat w)
MO_VF_Neg l w -> cmmVec l (cmmFloat w)
MO_VF_Min l w -> cmmVec l (cmmFloat w)
MO_VF_Max l w -> cmmVec l (cmmFloat w)
MO_RelaxedRead r -> cmmBits r
MO_AlignmentCheck _ _ -> ty1
......@@ -555,8 +587,10 @@ machOpArgReps platform op =
MO_F_Mul r -> [r,r]
MO_F_Quot r -> [r,r]
MO_F_Neg r -> [r]
MO_F_Min r -> [r,r]
MO_F_Max r -> [r,r]
MO_FMA _ r -> [r,r,r]
MO_FMA _ l r -> [vecwidth l r, vecwidth l r, vecwidth l r]
MO_F_Eq r -> [r,r]
MO_F_Ne r -> [r,r]
......@@ -582,31 +616,45 @@ machOpArgReps platform op =
MO_WF_Bitcast w -> [w]
MO_FW_Bitcast w -> [w]
MO_V_Insert l r -> [typeWidth (vec l (cmmBits r)),r, W32]
MO_V_Extract l r -> [typeWidth (vec l (cmmBits r)), W32]
MO_VF_Insert l r -> [typeWidth (vec l (cmmFloat r)),r,W32]
MO_VF_Extract l r -> [typeWidth (vec l (cmmFloat r)),W32]
-- SIMD vector indices are always 32 bit
MO_V_Shuffle l r _ -> [vecwidth l r, vecwidth l r]
MO_VF_Shuffle l r _ -> [vecwidth l r, vecwidth l r]
MO_V_Add _ r -> [r,r]
MO_V_Sub _ r -> [r,r]
MO_V_Mul _ r -> [r,r]
MO_VS_Quot _ r -> [r,r]
MO_VS_Rem _ r -> [r,r]
MO_VS_Neg _ r -> [r]
MO_VU_Quot _ r -> [r,r]
MO_VU_Rem _ r -> [r,r]
MO_V_Broadcast _ r -> [r]
MO_V_Insert l r -> [vecwidth l r, r, W32]
MO_V_Extract l r -> [vecwidth l r, W32]
MO_VF_Broadcast _ r -> [r]
MO_VF_Insert l r -> [vecwidth l r, r, W32]
MO_VF_Extract l r -> [vecwidth l r, W32]
-- SIMD vector indices are always 32 bit
MO_VF_Add _ r -> [r,r]
MO_VF_Sub _ r -> [r,r]
MO_VF_Mul _ r -> [r,r]
MO_VF_Quot _ r -> [r,r]
MO_VF_Neg _ r -> [r]
MO_V_Add l w -> [vecwidth l w, vecwidth l w]
MO_V_Sub l w -> [vecwidth l w, vecwidth l w]
MO_V_Mul l w -> [vecwidth l w, vecwidth l w]
MO_VS_Quot l w -> [vecwidth l w, vecwidth l w]
MO_VS_Rem l w -> [vecwidth l w, vecwidth l w]
MO_VS_Neg l w -> [vecwidth l w]
MO_VS_Min l w -> [vecwidth l w, vecwidth l w]
MO_VS_Max l w -> [vecwidth l w, vecwidth l w]
MO_VU_Quot l w -> [vecwidth l w, vecwidth l w]
MO_VU_Rem l w -> [vecwidth l w, vecwidth l w]
MO_VU_Min l w -> [vecwidth l w, vecwidth l w]
MO_VU_Max l w -> [vecwidth l w, vecwidth l w]
-- NOTE: The below is owing to the fact that floats use the SSE registers
MO_VF_Add l w -> [vecwidth l w, vecwidth l w]
MO_VF_Sub l w -> [vecwidth l w, vecwidth l w]
MO_VF_Mul l w -> [vecwidth l w, vecwidth l w]
MO_VF_Quot l w -> [vecwidth l w, vecwidth l w]
MO_VF_Neg l w -> [vecwidth l w]
MO_VF_Min l w -> [vecwidth l w, vecwidth l w]
MO_VF_Max l w -> [vecwidth l w, vecwidth l w]
MO_RelaxedRead _ -> [wordWidth platform]
MO_AlignmentCheck _ r -> [r]
where
vecwidth l w = widthFromBytes (l * widthInBytes w)
-----------------------------------------------------------------------------
-- CallishMachOp
......
......@@ -118,7 +118,7 @@ data CmmNode e x where
-- occur in CmmExprs, namely as (CmmLit (CmmBlock b)) or
-- (CmmStackSlot (Young b) _).
cml_args_regs :: [GlobalReg],
cml_args_regs :: [GlobalRegUse],
-- The argument GlobalRegs (Rx, Fx, Dx, Lx) that are passed
-- to the call. This is essential information for the
-- native code generator's register allocator; without
......@@ -544,7 +544,7 @@ instance UserOfRegs LocalReg (CmmNode e x) where
=> (b -> LocalReg -> b) -> b -> a -> b
fold f z n = foldRegsUsed platform f z n
instance UserOfRegs GlobalReg (CmmNode e x) where
instance UserOfRegs GlobalRegUse (CmmNode e x) where
{-# INLINEABLE foldRegsUsed #-}
foldRegsUsed platform f !z n = case n of
CmmAssign _ expr -> fold f z expr
......@@ -555,8 +555,8 @@ instance UserOfRegs GlobalReg (CmmNode e x) where
CmmCall {cml_target=tgt, cml_args_regs=args} -> fold f (fold f z args) tgt
CmmForeignCall {tgt=tgt, args=args} -> fold f (fold f z tgt) args
_ -> z
where fold :: forall a b. UserOfRegs GlobalReg a
=> (b -> GlobalReg -> b) -> b -> a -> b
where fold :: forall a b. UserOfRegs GlobalRegUse a
=> (b -> GlobalRegUse -> b) -> b -> a -> b
fold f z n = foldRegsUsed platform f z n
instance (Ord r, UserOfRegs r CmmReg) => UserOfRegs r ForeignTarget where
-- The (Ord r) in the context is necessary here
......@@ -576,7 +576,7 @@ instance DefinerOfRegs LocalReg (CmmNode e x) where
=> (b -> LocalReg -> b) -> b -> a -> b
fold f z n = foldRegsDefd platform f z n
instance DefinerOfRegs GlobalReg (CmmNode e x) where
instance DefinerOfRegs GlobalRegUse (CmmNode e x) where
{-# INLINEABLE foldRegsDefd #-}
foldRegsDefd platform f !z n = case n of
CmmAssign lhs _ -> fold f z lhs
......@@ -585,12 +585,13 @@ instance DefinerOfRegs GlobalReg (CmmNode e x) where
CmmForeignCall {} -> fold f z activeRegs
-- See Note [Safe foreign calls clobber STG registers]
_ -> z
where fold :: forall a b. DefinerOfRegs GlobalReg a
=> (b -> GlobalReg -> b) -> b -> a -> b
where fold :: forall a b. DefinerOfRegs GlobalRegUse a
=> (b -> GlobalRegUse -> b) -> b -> a -> b
fold f z n = foldRegsDefd platform f z n
activeRegs = activeStgRegs platform
activeCallerSavesRegs = filter (callerSaves platform) activeRegs
activeRegs :: [GlobalRegUse]
activeRegs = map (\ r -> GlobalRegUse r (globalRegSpillType platform r)) $ activeStgRegs platform
activeCallerSavesRegs = filter (callerSaves platform . globalRegUseGlobalReg) activeRegs
foreignTargetRegs (ForeignTarget _ (ForeignConvention _ _ _ CmmNeverReturns)) = []
foreignTargetRegs _ = activeCallerSavesRegs
......
......@@ -79,7 +79,11 @@ cmmMachOpFoldM
-> MachOp
-> [CmmExpr]
-> Maybe CmmExpr
cmmMachOpFoldM _ (MO_V_Broadcast {}) _ = Nothing
cmmMachOpFoldM _ (MO_VF_Broadcast {}) _ = Nothing
-- SIMD NCG TODO: supporting constant folding for vector operations
-- would require augmenting getRegister' to handle them.
-- See the code for "getRegister' platform _ (CmmLit lit)".
cmmMachOpFoldM _ op [CmmLit (CmmInt x rep)]
= Just $! case op of
MO_S_Neg _ -> CmmLit (CmmInt (-x) rep)
......@@ -93,7 +97,6 @@ cmmMachOpFoldM _ op [CmmLit (CmmInt x rep)]
MO_SS_Conv from to -> CmmLit (CmmInt (narrowS from x) to)
MO_UU_Conv from to -> CmmLit (CmmInt (narrowU from x) to)
MO_XX_Conv from to -> CmmLit (CmmInt (narrowS from x) to)
_ -> panic $ "cmmMachOpFoldM: unknown unary op: " ++ show op
-- Eliminate shifts that are wider than the shiftee
......@@ -237,23 +240,33 @@ cmmMachOpFoldM _ MO_Add{} [ CmmMachOp op@MO_Add{} [pic, CmmLit lit]
= Just $! CmmMachOp op [pic, CmmLit $ cmmOffsetLit lit off ]
where off = fromIntegral (narrowS rep n)
-- Make a RegOff if we can
-- Make a RegOff if we can. We don't perform this optimization if rep is greater
-- than the host word size because we use an Int to store the offset. See
-- #24893 and #24700. This should be fixed to ensure that optimizations don't
-- depend on the compiler host platform.
cmmMachOpFoldM _ (MO_Add _) [CmmReg reg, CmmLit (CmmInt n rep)]
| validOffsetRep rep
= Just $! cmmRegOff reg (fromIntegral (narrowS rep n))
cmmMachOpFoldM _ (MO_Add _) [CmmRegOff reg off, CmmLit (CmmInt n rep)]
| validOffsetRep rep
= Just $! cmmRegOff reg (off + fromIntegral (narrowS rep n))
cmmMachOpFoldM _ (MO_Sub _) [CmmReg reg, CmmLit (CmmInt n rep)]
| validOffsetRep rep
= Just $! cmmRegOff reg (- fromIntegral (narrowS rep n))
cmmMachOpFoldM _ (MO_Sub _) [CmmRegOff reg off, CmmLit (CmmInt n rep)]
| validOffsetRep rep
= Just $! cmmRegOff reg (off - fromIntegral (narrowS rep n))
-- Fold label(+/-)offset into a CmmLit where possible
cmmMachOpFoldM _ (MO_Add _) [CmmLit lit, CmmLit (CmmInt i rep)]
| validOffsetRep rep
= Just $! CmmLit (cmmOffsetLit lit (fromIntegral (narrowU rep i)))
cmmMachOpFoldM _ (MO_Add _) [CmmLit (CmmInt i rep), CmmLit lit]
| validOffsetRep rep
= Just $! CmmLit (cmmOffsetLit lit (fromIntegral (narrowU rep i)))
cmmMachOpFoldM _ (MO_Sub _) [CmmLit lit, CmmLit (CmmInt i rep)]
| validOffsetRep rep
= Just $! CmmLit (cmmOffsetLit lit (fromIntegral (negate (narrowU rep i))))
......@@ -409,6 +422,13 @@ cmmMachOpFoldM platform mop [x, (CmmLit (CmmInt n _))]
cmmMachOpFoldM _ _ _ = Nothing
-- | Check that a literal width is compatible with the host word size used to
-- store offsets. This should be fixed properly (using larger types to store
-- literal offsets). See #24893
validOffsetRep :: Width -> Bool
validOffsetRep rep = widthInBits rep <= finiteBitSize (undefined :: Int)
{- Note [Comparison operators]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If we have
......
......@@ -381,9 +381,9 @@ import qualified Data.ByteString.Char8 as BS8
'bits16' { L _ (CmmT_bits16) }
'bits32' { L _ (CmmT_bits32) }
'bits64' { L _ (CmmT_bits64) }
'bits128' { L _ (CmmT_bits128) }
'bits256' { L _ (CmmT_bits256) }
'bits512' { L _ (CmmT_bits512) }
'vec128' { L _ (CmmT_vec128) }
'vec256' { L _ (CmmT_vec256) }
'vec512' { L _ (CmmT_vec512) }
'float32' { L _ (CmmT_float32) }
'float64' { L _ (CmmT_float64) }
'gcptr' { L _ (CmmT_gcptr) }
......@@ -770,13 +770,13 @@ safety :: { Safety }
: {- empty -} { PlayRisky }
| STRING {% parseSafety $1 }
vols :: { [GlobalReg] }
vols :: { [GlobalRegUse] }
: '[' ']' { [] }
| '[' '*' ']' {% do platform <- PD.getPlatform
; return (realArgRegsCover platform) }
-- All of them. See comment attached
-- to realArgRegsCover
| '[' globals ']' { map globalRegUseGlobalReg $2 }
| '[' '*' ']' {% do platform <- PD.getPlatform;
return $ realArgRegsCover platform }
-- All of them. See comment attached
-- to realArgRegsCover
| '[' globals ']' { $2 }
globals :: { [GlobalRegUse] }
: GLOBALREG { [$1] }
......@@ -942,9 +942,9 @@ typenot8 :: { CmmType }
: 'bits16' { b16 }
| 'bits32' { b32 }
| 'bits64' { b64 }
| 'bits128' { b128 }
| 'bits256' { b256 }
| 'bits512' { b512 }
| 'vec128' { cmmVec 2 f64 }
| 'vec256' { cmmVec 4 f64 }
| 'vec512' { cmmVec 8 f64 }
| 'float32' { f32 }
| 'float64' { f64 }
| 'gcptr' {% do platform <- PD.getPlatform; return $ gcWord platform }
......@@ -1050,11 +1050,13 @@ machOps = listToUFM $
( "fneg", MO_F_Neg ),
( "fmul", MO_F_Mul ),
( "fquot", MO_F_Quot ),
( "fmin", MO_F_Min ),
( "fmax", MO_F_Max ),
( "fmadd" , MO_FMA FMAdd ),
( "fmsub" , MO_FMA FMSub ),
( "fnmadd", MO_FMA FNMAdd ),
( "fnmsub", MO_FMA FNMSub ),
( "fmadd" , MO_FMA FMAdd 1 ),
( "fmsub" , MO_FMA FMSub 1 ),
( "fnmadd", MO_FMA FNMAdd 1 ),
( "fnmsub", MO_FMA FNMSub 1 ),
( "feq", MO_F_Eq ),
( "fne", MO_F_Ne ),
......@@ -1377,7 +1379,7 @@ mkReturnSimple profile actuals updfr_off =
where e = entryCode platform (cmmLoadGCWord platform (CmmStackSlot Old updfr_off))
platform = profilePlatform profile
doRawJump :: CmmParse CmmExpr -> [GlobalReg] -> CmmParse ()
doRawJump :: CmmParse CmmExpr -> [GlobalRegUse] -> CmmParse ()
doRawJump expr_code vols = do
profile <- getProfile
expr <- expr_code
......
......@@ -262,7 +262,7 @@ splitAtProcPoints platform entry_label callPPs procPoints procMap cmmProc = do
let liveness = cmmGlobalLiveness platform g
let ppLiveness pp = filter isArgReg $ regSetToList $
let ppLiveness pp = filter (isArgReg . globalRegUseGlobalReg) $ regSetToList $
expectJust "ppLiveness" $ mapLookup pp liveness
graphEnv <- return $ foldlGraphBlocks add_block mapEmpty g
......
......@@ -96,8 +96,8 @@ instance Outputable CmmReg where
pprReg :: CmmReg -> SDoc
pprReg r
= case r of
CmmLocal local -> pprLocalReg local
CmmGlobal (GlobalRegUse global _) -> pprGlobalReg global
CmmLocal local -> pprLocalReg local
CmmGlobal (GlobalRegUse global _ty) -> pprGlobalReg global
cmmRegType :: CmmReg -> CmmType
cmmRegType (CmmLocal reg) = localRegType reg
......@@ -202,6 +202,13 @@ data GlobalReg
| LongReg -- long int registers (64-bit, really)
{-# UNPACK #-} !Int -- its number
-- I think we should redesign 'GlobalReg', for example instead of
-- FloatReg/DoubleReg/XmmReg/YmmReg/ZmmReg we could have a single VecReg
-- which also stores the type we are storing in it.
--
-- We might then be able to get rid of GlobalRegUse, as the type information
-- would already be contained in a 'GlobalReg'.
| XmmReg -- 128-bit SIMD vector register
{-# UNPACK #-} !Int -- its number
......@@ -212,39 +219,40 @@ data GlobalReg
{-# UNPACK #-} !Int -- its number
-- STG registers
| Sp -- Stack ptr; points to last occupied stack location.
| SpLim -- Stack limit
| Hp -- Heap ptr; points to last occupied heap location.
| HpLim -- Heap limit register
| CCCS -- Current cost-centre stack
| CurrentTSO -- pointer to current thread's TSO
| CurrentNursery -- pointer to allocation area
| HpAlloc -- allocation count for heap check failure
| Sp -- ^ Stack ptr; points to last occupied stack location.
| SpLim -- ^ Stack limit
| Hp -- ^ Heap ptr; points to last occupied heap location.
| HpLim -- ^ Heap limit register
| CCCS -- ^ Current cost-centre stack
| CurrentTSO -- ^ pointer to current thread's TSO
| CurrentNursery -- ^ pointer to allocation area
| HpAlloc -- ^ allocation count for heap check failure
-- We keep the address of some commonly-called
-- functions in the register table, to keep code
-- size down:
| EagerBlackholeInfo -- stg_EAGER_BLACKHOLE_info
| GCEnter1 -- stg_gc_enter_1
| GCFun -- stg_gc_fun
| EagerBlackholeInfo -- ^ address of stg_EAGER_BLACKHOLE_info
| GCEnter1 -- ^ address of stg_gc_enter_1
| GCFun -- ^ address of stg_gc_fun
-- Base offset for the register table, used for accessing registers
-- | Base offset for the register table, used for accessing registers
-- which do not have real registers assigned to them. This register
-- will only appear after we have expanded GlobalReg into memory accesses
-- (where necessary) in the native code generator.
| BaseReg
-- The register used by the platform for the C stack pointer. This is
-- | The register used by the platform for the C stack pointer. This is
-- a break in the STG abstraction used exclusively to setup stack unwinding
-- information.
| MachSp
-- The is a dummy register used to indicate to the stack unwinder where
-- | A dummy register used to indicate to the stack unwinder where
-- a routine would return to.
| UnwindReturnReg
-- Base Register for PIC (position-independent code) calculations
-- Only used inside the native code generator. It's exact meaning differs
-- | Base Register for PIC (position-independent code) calculations.
--
-- Only used inside the native code generator. Its exact meaning differs
-- from platform to platform (see module PositionIndependentCode).
| PicBaseReg
......
......@@ -709,7 +709,7 @@ conflicts platform (r, rhs, addr) node
globalRegistersConflict :: Platform -> CmmExpr -> CmmNode e x -> Bool
globalRegistersConflict platform expr node =
-- See Note [Inlining foldRegsDefd]
inline foldRegsDefd platform (\b r -> b || globalRegUsedIn platform r expr)
inline foldRegsDefd platform (\b r -> b || globalRegUsedIn platform (globalRegUseGlobalReg r) expr)
False node
-- Returns True if node defines any local registers that are used in the
......
......@@ -4,7 +4,7 @@ module GHC.Cmm.Type
, cInt
, cmmBits, cmmFloat
, typeWidth, setCmmTypeWidth
, cmmEqType, cmmEqType_ignoring_ptrhood
, cmmEqType, cmmCompatType
, isFloatType, isGcPtrType, isBitsType
, isWordAny, isWord32, isWord64
, isFloat64, isFloat32
......@@ -87,21 +87,27 @@ instance Outputable CmmCat where
cmmEqType :: CmmType -> CmmType -> Bool -- Exact equality
cmmEqType (CmmType c1 w1) (CmmType c2 w2) = c1==c2 && w1==w2
cmmEqType_ignoring_ptrhood :: CmmType -> CmmType -> Bool
-- This equality is temporary; used in CmmLint
-- but the RTS files are not yet well-typed wrt pointers
cmmEqType_ignoring_ptrhood (CmmType c1 w1) (CmmType c2 w2)
= c1 `weak_eq` c2 && w1==w2
-- | A weaker notion of equality of 'CmmType's than 'cmmEqType',
-- used (only) in Cmm Lint.
--
-- Why "weaker"? Because:
--
-- - we don't distinguish GcPtr vs NonGcPtr, because the the RTS files
-- are not yet well-typed wrt pointers,
-- - for vectors, we only compare the widths, because in practice things like
-- X86 xmm registers support different types of data (e.g. 4xf32, 2xf64, 2xu64 etc).
cmmCompatType :: CmmType -> CmmType -> Bool
cmmCompatType (CmmType c1 w1) (CmmType c2 w2)
= c1 `weak_eq` c2 && w1 == w2
where
weak_eq :: CmmCat -> CmmCat -> Bool
FloatCat `weak_eq` FloatCat = True
FloatCat `weak_eq` _other = False
_other `weak_eq` FloatCat = False
(VecCat l1 cat1) `weak_eq` (VecCat l2 cat2) = l1 == l2
&& cat1 `weak_eq` cat2
(VecCat {}) `weak_eq` _other = False
_other `weak_eq` (VecCat {}) = False
_word1 `weak_eq` _word2 = True -- Ignores GcPtr
FloatCat `weak_eq` FloatCat = True
FloatCat `weak_eq` _other = False
_other `weak_eq` FloatCat = False
(VecCat {}) `weak_eq` (VecCat {}) = True -- only compare overall width
(VecCat {}) `weak_eq` _other = False
_other `weak_eq` (VecCat {}) = False
_word1 `weak_eq` _word2 = True -- Ignores GcPtr
--- Simple operations on CmmType -----
typeWidth :: CmmType -> Width
......
......@@ -240,6 +240,7 @@ finishNativeGen logger config modLoc bufh us ngs
-- dump global NCG stats for graph coloring allocator
let stats = concat (ngs_colorStats ngs)
platform = ncgPlatform config
unless (null stats) $ do
-- build the global register conflict graph
......@@ -250,7 +251,7 @@ finishNativeGen logger config modLoc bufh us ngs
dump_stats (Color.pprStats stats graphGlobal)
let platform = ncgPlatform config
putDumpFileMaybe logger
Opt_D_dump_asm_conflicts "Register conflict graph"
FormatText
......@@ -265,7 +266,7 @@ finishNativeGen logger config modLoc bufh us ngs
-- dump global NCG stats for linear allocator
let linearStats = concat (ngs_linearStats ngs)
unless (null linearStats) $
dump_stats (Linear.pprStats (concat (ngs_natives ngs)) linearStats)
dump_stats (Linear.pprStats platform (concat (ngs_natives ngs)) linearStats)
-- write out the imports
let ctx = ncgAsmContext config
......@@ -506,7 +507,7 @@ cmmNativeGen logger ncgImpl us fileIds dbgMap cmm count
if ( ncgRegsGraph config || ncgRegsIterative config )
then do
-- the regs usable for allocation
let (alloc_regs :: UniqFM RegClass (UniqSet RealReg))
let alloc_regs :: UniqFM RegClass (UniqSet RealReg)
= foldr (\r -> plusUFM_C unionUniqSets
$ unitUFM (targetClassOfRealReg platform r) (unitUniqSet r))
emptyUFM
......
......@@ -44,7 +44,7 @@ ncgAArch64 config
-- | Instruction instance for aarch64
instance Instruction AArch64.Instr where
regUsageOfInstr = AArch64.regUsageOfInstr
patchRegsOfInstr = AArch64.patchRegsOfInstr
patchRegsOfInstr _ = AArch64.patchRegsOfInstr
isJumpishInstr = AArch64.isJumpishInstr
jumpDestsOfInstr = AArch64.jumpDestsOfInstr
canFallthroughTo = AArch64.canFallthroughTo
......@@ -54,7 +54,7 @@ instance Instruction AArch64.Instr where
takeDeltaInstr = AArch64.takeDeltaInstr
isMetaInstr = AArch64.isMetaInstr
mkRegRegMoveInstr _ = AArch64.mkRegRegMoveInstr
takeRegRegMoveInstr = AArch64.takeRegRegMoveInstr
takeRegRegMoveInstr _ = AArch64.takeRegRegMoveInstr
mkJumpInstr = AArch64.mkJumpInstr
mkStackAllocInstr = AArch64.mkStackAllocInstr
mkStackDeallocInstr = AArch64.mkStackDeallocInstr
......
......@@ -758,8 +758,79 @@ getRegister' config plat expr
-- Conversions
MO_XX_Conv _from to -> swizzleRegisterRep (intFormat to) <$> getRegister e
_ -> pprPanic "getRegister' (monadic CmmMachOp):" (pdoc plat expr)
MO_Eq {} -> notUnary
MO_Ne {} -> notUnary
MO_Mul {} -> notUnary
MO_S_MulMayOflo {} -> notUnary
MO_S_Quot {} -> notUnary
MO_S_Rem {} -> notUnary
MO_U_Quot {} -> notUnary
MO_U_Rem {} -> notUnary
MO_S_Ge {} -> notUnary
MO_S_Le {} -> notUnary
MO_S_Gt {} -> notUnary
MO_S_Lt {} -> notUnary
MO_U_Ge {} -> notUnary
MO_U_Le {} -> notUnary
MO_U_Gt {} -> notUnary
MO_U_Lt {} -> notUnary
MO_F_Add {} -> notUnary
MO_F_Sub {} -> notUnary
MO_F_Mul {} -> notUnary
MO_F_Quot {} -> notUnary
MO_FMA {} -> notUnary
MO_F_Eq {} -> notUnary
MO_F_Ne {} -> notUnary
MO_F_Ge {} -> notUnary
MO_F_Le {} -> notUnary
MO_F_Gt {} -> notUnary
MO_F_Lt {} -> notUnary
MO_And {} -> notUnary
MO_Or {} -> notUnary
MO_Xor {} -> notUnary
MO_Shl {} -> notUnary
MO_U_Shr {} -> notUnary
MO_S_Shr {} -> notUnary
MO_V_Insert {} -> notUnary
MO_V_Extract {} -> notUnary
MO_V_Add {} -> notUnary
MO_V_Sub {} -> notUnary
MO_V_Mul {} -> notUnary
MO_VS_Quot {} -> notUnary
MO_VS_Rem {} -> notUnary
MO_VS_Neg {} -> notUnary
MO_VU_Quot {} -> notUnary
MO_VU_Rem {} -> notUnary
MO_V_Shuffle {} -> notUnary
MO_VF_Shuffle {} -> notUnary
MO_VF_Insert {} -> notUnary
MO_VF_Extract {} -> notUnary
MO_VF_Add {} -> notUnary
MO_VF_Sub {} -> notUnary
MO_VF_Mul {} -> notUnary
MO_VF_Quot {} -> notUnary
MO_Add {} -> notUnary
MO_Sub {} -> notUnary
MO_F_Min {} -> notUnary
MO_F_Max {} -> notUnary
MO_VU_Min {} -> notUnary
MO_VU_Max {} -> notUnary
MO_VS_Min {} -> notUnary
MO_VS_Max {} -> notUnary
MO_VF_Min {} -> notUnary
MO_VF_Max {} -> notUnary
MO_AlignmentCheck {} ->
pprPanic "getRegister' (monadic CmmMachOp):" (pdoc plat expr)
MO_V_Broadcast {} -> vectorsNeedLlvm
MO_VF_Broadcast {} -> vectorsNeedLlvm
MO_VF_Neg {} -> vectorsNeedLlvm
where
notUnary = pprPanic "getRegister' (non-unary CmmMachOp with 1 argument):" (pdoc plat expr)
vectorsNeedLlvm =
sorry "SIMD operations on AArch64 currently require the LLVM backend"
toImm W8 = (OpImm (ImmInt 7))
toImm W16 = (OpImm (ImmInt 15))
toImm W32 = (OpImm (ImmInt 31))
......@@ -1064,6 +1135,8 @@ getRegister' config plat expr
MO_F_Sub w -> floatOp w (\d x y -> unitOL $ SUB d x y)
MO_F_Mul w -> floatOp w (\d x y -> unitOL $ MUL d x y)
MO_F_Quot w -> floatOp w (\d x y -> unitOL $ SDIV d x y)
MO_F_Min w -> floatOp w (\d x y -> unitOL $ FMIN d x y)
MO_F_Max w -> floatOp w (\d x y -> unitOL $ FMAX d x y)
-- Floating point comparison
MO_F_Eq w -> floatCond w (\d x y -> toOL [ CMP x y, CSET d EQ ])
......@@ -1087,10 +1160,56 @@ getRegister' config plat expr
MO_U_Shr w -> intOp False w (\d x y -> unitOL $ LSR d x y)
MO_S_Shr w -> intOp True w (\d x y -> unitOL $ ASR d x y)
-- TODO
op -> pprPanic "getRegister' (unhandled dyadic CmmMachOp): " $
(pprMachOp op) <+> text "in" <+> (pdoc plat expr)
-- Non-dyadic MachOp with 2 arguments
MO_S_Neg {} -> notDyadic
MO_F_Neg {} -> notDyadic
MO_FMA {} -> notDyadic
MO_Not {} -> notDyadic
MO_SF_Round {} -> notDyadic
MO_FS_Truncate {} -> notDyadic
MO_SS_Conv {} -> notDyadic
MO_UU_Conv {} -> notDyadic
MO_XX_Conv {} -> notDyadic
MO_FF_Conv {} -> notDyadic
MO_WF_Bitcast {} -> notDyadic
MO_FW_Bitcast {} -> notDyadic
MO_V_Broadcast {} -> notDyadic
MO_VF_Broadcast {} -> notDyadic
MO_V_Insert {} -> notDyadic
MO_VF_Insert {} -> notDyadic
MO_AlignmentCheck {} -> notDyadic
MO_RelaxedRead {} -> notDyadic
-- Vector operations: currently unsupported in the AArch64 NCG.
MO_V_Extract {} -> vectorsNeedLlvm
MO_V_Add {} -> vectorsNeedLlvm
MO_V_Sub {} -> vectorsNeedLlvm
MO_V_Mul {} -> vectorsNeedLlvm
MO_VS_Quot {} -> vectorsNeedLlvm
MO_VS_Rem {} -> vectorsNeedLlvm
MO_VS_Neg {} -> vectorsNeedLlvm
MO_VU_Quot {} -> vectorsNeedLlvm
MO_VU_Rem {} -> vectorsNeedLlvm
MO_VF_Extract {} -> vectorsNeedLlvm
MO_VF_Add {} -> vectorsNeedLlvm
MO_VF_Sub {} -> vectorsNeedLlvm
MO_VF_Neg {} -> vectorsNeedLlvm
MO_VF_Mul {} -> vectorsNeedLlvm
MO_VF_Quot {} -> vectorsNeedLlvm
MO_V_Shuffle {} -> vectorsNeedLlvm
MO_VF_Shuffle {} -> vectorsNeedLlvm
MO_VU_Min {} -> vectorsNeedLlvm
MO_VU_Max {} -> vectorsNeedLlvm
MO_VS_Min {} -> vectorsNeedLlvm
MO_VS_Max {} -> vectorsNeedLlvm
MO_VF_Min {} -> vectorsNeedLlvm
MO_VF_Max {} -> vectorsNeedLlvm
where
notDyadic =
pprPanic "getRegister' (non-dyadic CmmMachOp with 2 arguments): " $
(pprMachOp op) <+> text "in" <+> (pdoc plat expr)
vectorsNeedLlvm =
sorry "SIMD operations on AArch64 currently require the LLVM backend"
-- Generic ternary case.
CmmMachOp op [x, y, z] ->
......@@ -1104,16 +1223,25 @@ getRegister' config plat expr
-- x86 fnmadd - x * y + z <=> AArch64 fmsub : d = - r1 * r2 + r3
-- x86 fnmsub - x * y - z <=> AArch64 fnmadd: d = - r1 * r2 - r3
MO_FMA var w -> case var of
FMAdd -> float3Op w (\d n m a -> unitOL $ FMA FMAdd d n m a)
FMSub -> float3Op w (\d n m a -> unitOL $ FMA FNMSub d n m a)
FNMAdd -> float3Op w (\d n m a -> unitOL $ FMA FMSub d n m a)
FNMSub -> float3Op w (\d n m a -> unitOL $ FMA FNMAdd d n m a)
MO_FMA var l w
| l == 1
-> case var of
FMAdd -> float3Op w (\d n m a -> unitOL $ FMA FMAdd d n m a)
FMSub -> float3Op w (\d n m a -> unitOL $ FMA FNMSub d n m a)
FNMAdd -> float3Op w (\d n m a -> unitOL $ FMA FMSub d n m a)
FNMSub -> float3Op w (\d n m a -> unitOL $ FMA FNMAdd d n m a)
| otherwise
-> vectorsNeedLlvm
MO_V_Insert {} -> vectorsNeedLlvm
MO_VF_Insert {} -> vectorsNeedLlvm
_ -> pprPanic "getRegister' (unhandled ternary CmmMachOp): " $
(pprMachOp op) <+> text "in" <+> (pdoc plat expr)
where
vectorsNeedLlvm =
sorry "SIMD operations on AArch64 currently require the LLVM backend"
float3Op w op = do
(reg_fx, format_x, code_fx) <- getFloatReg x
(reg_fy, format_y, code_fy) <- getFloatReg y
......
......@@ -15,6 +15,7 @@ import GHC.CmmToAsm.Types
import GHC.CmmToAsm.Utils
import GHC.CmmToAsm.Config
import GHC.Platform.Reg
import GHC.Platform.Reg.Class.Unified
import GHC.Platform.Regs
import GHC.Cmm.BlockId
......@@ -30,6 +31,7 @@ import GHC.Utils.Panic
import Data.Maybe (fromMaybe)
import GHC.Stack
import GHC.CmmToAsm.Reg.Target (targetClassOfReg)
-- | LR and FP (8 byte each) are the prologue of each stack frame
stackFrameHeaderSize :: Int
......@@ -143,6 +145,8 @@ regUsageOfInstr platform instr = case instr of
FCVTZS dst src -> usage (regOp src, regOp dst)
FABS dst src -> usage (regOp src, regOp dst)
FSQRT dst src -> usage (regOp src, regOp dst)
FMIN dst src1 src2 -> usage (regOp src1 ++ regOp src2, regOp dst)
FMAX dst src1 src2 -> usage (regOp src1 ++ regOp src2, regOp dst)
FMA _ dst src1 src2 src3 ->
usage (regOp src1 ++ regOp src2 ++ regOp src3, regOp dst)
......@@ -153,8 +157,15 @@ regUsageOfInstr platform instr = case instr of
-- filtering the usage is necessary, otherwise the register
-- allocator will try to allocate pre-defined fixed stg
-- registers as well, as they show up.
usage (src, dst) = RU (filter (interesting platform) src)
(filter (interesting platform) dst)
usage (src, dst) = RU (map mkFmt $ filter (interesting platform) src)
(map mkFmt $ filter (interesting platform) dst)
-- SIMD NCG TODO: the format here is used for register spilling/unspilling.
-- As the AArch64 NCG does not currently support SIMD registers,
-- this simple logic is OK.
mkFmt r = RegFormat r fmt
where fmt = case targetClassOfReg platform r of
RcInteger -> II64
RcFloatOrVector -> FF64
regAddr :: AddrMode -> [Reg]
regAddr (AddrRegReg r1 r2) = [r1, r2]
......@@ -290,6 +301,8 @@ patchRegsOfInstr instr env = case instr of
FCVTZS o1 o2 -> FCVTZS (patchOp o1) (patchOp o2)
FABS o1 o2 -> FABS (patchOp o1) (patchOp o2)
FSQRT o1 o2 -> FSQRT (patchOp o1) (patchOp o2)
FMIN o1 o2 o3 -> FMIN (patchOp o1) (patchOp o2) (patchOp o3)
FMAX o1 o2 o3 -> FMAX (patchOp o1) (patchOp o2) (patchOp o3)
FMA s o1 o2 o3 o4 ->
FMA s (patchOp o1) (patchOp o2) (patchOp o3) (patchOp o4)
......@@ -378,12 +391,12 @@ patchJumpInstr instr patchF
mkSpillInstr
:: HasCallStack
=> NCGConfig
-> Reg -- register to spill
-> RegFormat -- register to spill
-> Int -- current stack delta
-> Int -- spill slot to use
-> [Instr]
mkSpillInstr config reg delta slot =
mkSpillInstr config (RegFormat reg fmt) delta slot =
case off - delta of
imm | -256 <= imm && imm <= 255 -> [ mkStrSp imm ]
imm | imm > 0 && imm .&. 0x7 == 0x0 && imm <= 0xfff -> [ mkStrSp imm ]
......@@ -394,8 +407,8 @@ mkSpillInstr config reg delta slot =
where
a .&~. b = a .&. (complement b)
fmt = fmtOfRealReg (case reg of { RegReal r -> r; _ -> panic "Expected real reg"})
-- SIMD NCG TODO: emit the correct instructions to spill a vector register.
-- You can take inspiration from the X86_64 backend.
mkIp0SpillAddr imm = ANN (text "Spill: IP0 <- SP + " <> int imm) $ ADD ip0 sp (OpImm (ImmInt imm))
mkStrSp imm = ANN (text "Spill@" <> int (off - delta)) $ STR fmt (OpReg W64 reg) (OpAddr (AddrRegImm (regSingle 31) (ImmInt imm)))
mkStrIp0 imm = ANN (text "Spill@" <> int (off - delta)) $ STR fmt (OpReg W64 reg) (OpAddr (AddrRegImm (regSingle 16) (ImmInt imm)))
......@@ -404,12 +417,11 @@ mkSpillInstr config reg delta slot =
mkLoadInstr
:: NCGConfig
-> Reg -- register to load
-> RegFormat
-> Int -- current stack delta
-> Int -- spill slot to use
-> [Instr]
mkLoadInstr config reg delta slot =
mkLoadInstr config (RegFormat reg fmt) delta slot =
case off - delta of
imm | -256 <= imm && imm <= 255 -> [ mkLdrSp imm ]
imm | imm > 0 && imm .&. 0x7 == 0x0 && imm <= 0xfff -> [ mkLdrSp imm ]
......@@ -420,8 +432,8 @@ mkLoadInstr config reg delta slot =
where
a .&~. b = a .&. (complement b)
fmt = fmtOfRealReg (case reg of { RegReal r -> r; _ -> panic "Expected real reg"})
-- SIMD NCG TODO: emit the correct instructions to load a vector register.
-- You can take inspiration from the X86_64 backend.
mkIp0SpillAddr imm = ANN (text "Reload: IP0 <- SP + " <> int imm) $ ADD ip0 sp (OpImm (ImmInt imm))
mkLdrSp imm = ANN (text "Reload@" <> int (off - delta)) $ LDR fmt (OpReg W64 reg) (OpAddr (AddrRegImm (regSingle 31) (ImmInt imm)))
mkLdrIp0 imm = ANN (text "Reload@" <> int (off - delta)) $ LDR fmt (OpReg W64 reg) (OpAddr (AddrRegImm (regSingle 16) (ImmInt imm)))
......@@ -451,8 +463,10 @@ isMetaInstr instr
-- | Copy the value in a register to another one.
-- Must work for all register classes.
mkRegRegMoveInstr :: Reg -> Reg -> Instr
mkRegRegMoveInstr src dst = ANN (text "Reg->Reg Move: " <> ppr src <> text " -> " <> ppr dst) $ MOV (OpReg W64 dst) (OpReg W64 src)
mkRegRegMoveInstr :: Format -> Reg -> Reg -> Instr
mkRegRegMoveInstr _fmt src dst
= ANN (text "Reg->Reg Move: " <> ppr src <> text " -> " <> ppr dst) $ MOV (OpReg W64 dst) (OpReg W64 src)
-- SIMD NCG TODO: incorrect for vector formats
-- | Take the source and destination from this reg -> reg move instruction
-- or Nothing if it's not one
......@@ -661,6 +675,10 @@ data Instr
| FCVTZS Operand Operand
-- Float ABSolute value
| FABS Operand Operand
-- Float minimum
| FMIN Operand Operand Operand
-- Float maximum
| FMAX Operand Operand Operand
-- Float SQuare RooT
| FSQRT Operand Operand
......@@ -737,6 +755,8 @@ instrCon i =
FCVTZS{} -> "FCVTZS"
FABS{} -> "FABS"
FSQRT{} -> "FSQRT"
FMIN {} -> "FMIN"
FMAX {} -> "FMAX"
FMA variant _ _ _ _ ->
case variant of
FMAdd -> "FMADD"
......