1. 04 Feb, 2010 1 commit
    • Simon Marlow's avatar
      Implement SSE2 floating-point support in the x86 native code generator (#594) · 335b9f36
      Simon Marlow authored
      The new flag -msse2 enables code generation for SSE2 on x86.  It
      results in substantially faster floating-point performance; the main
      reason for doing this was that our x87 code generation is appallingly
      bad, and since we plan to drop -fvia-C soon, we need a way to generate
      half-decent floating-point code.
      
      The catch is that SSE2 is only available on CPUs that support it (P4+,
      AMD K8+).  We'll have to think hard about whether we should enable it
      by default for the libraries we ship.  In the meantime, at least
      -msse2 should be an acceptable replacement for "-fvia-C
      -optc-ffast-math -fexcess-precision".
      
      SSE2 also has the advantage of performing all operations at the
      correct precision, so floating-point results are consistent with other
      platforms.
      
      I also tweaked the x87 code generation a bit while I was here, now
      it's slighlty less bad than before.
      335b9f36
  2. 07 Jul, 2009 1 commit
  3. 19 May, 2009 2 commits
  4. 18 May, 2009 1 commit
    • Ben.Lippmeier@anu.edu.au's avatar
      Split Reg into vreg/hreg and add register pairs · f9288086
      Ben.Lippmeier@anu.edu.au authored
       * The old Reg type is now split into VirtualReg and RealReg.
       * For the graph coloring allocator, the type of the register graph
         is now (Graph VirtualReg RegClass RealReg), which shows that it colors
         in nodes representing virtual regs with colors representing real regs.
         (as was intended)  
       * RealReg contains two contructors, RealRegSingle and RealRegPair,
         where RealRegPair is used to represent a SPARC double reg 
         constructed from two single precision FP regs. 
       * On SPARC we can now allocate double regs into an arbitrary register
         pair, instead of reserving some reg ranges to only hold float/double values. 
      f9288086
  5. 15 Feb, 2009 1 commit
    • Ben.Lippmeier@anu.edu.au's avatar
      NCG: Split up the native code generator into arch specific modules · b04a210e
      Ben.Lippmeier@anu.edu.au authored
        - nativeGen/Instruction defines a type class for a generic
          instruction set. Each of the instruction sets we have, 
          X86, PPC and SPARC are instances of it.
        
        - The register alloctors use this type class when they need
          info about a certain register or instruction, such as
          regUsage, mkSpillInstr, mkJumpInstr, patchRegs..
        
        - nativeGen/Platform defines some data types enumerating
          the architectures and operating systems supported by the 
          native code generator.
        
        - DynFlags now keeps track of the current build platform, and 
          the PositionIndependentCode module uses this to decide what
          to do instead of relying of #ifdefs.
        
        - It's not totally retargetable yet. Some info info about the
          build target is still hardwired, but I've tried to contain
          most of it to a single module, TargetRegs.
        
        - Moved the SPILL and RELOAD instructions into LiveInstr.
        
        - Reg and RegClass now have their own modules, and are shared
          across all architectures.
      b04a210e
  6. 04 Feb, 2009 2 commits
  7. 03 Feb, 2009 1 commit
  8. 21 Jul, 2008 1 commit
  9. 20 Jul, 2008 1 commit
  10. 07 Feb, 2008 2 commits
  11. 17 Jan, 2008 1 commit
    • Isaac Dupree's avatar
      lots of portability changes (#1405) · 206b4dec
      Isaac Dupree authored
      re-recording to avoid new conflicts was too hard, so I just put it
      all in one big patch :-(  (besides, some of the changes depended on
      each other.)  Here are what the component patches were:
      
      Fri Dec 28 11:02:55 EST 2007  Isaac Dupree <id@isaac.cedarswampstudios.org>
        * document BreakArray better
      
      Fri Dec 28 11:39:22 EST 2007  Isaac Dupree <id@isaac.cedarswampstudios.org>
        * properly ifdef BreakArray for GHCI
      
      Fri Jan  4 13:50:41 EST 2008  Isaac Dupree <id@isaac.cedarswampstudios.org>
        * change ifs on __GLASGOW_HASKELL__ to account for... (#1405)
        for it not being defined. I assume it being undefined implies
        a compiler with relatively modern libraries but without most
        unportable glasgow extensions.
      
      Fri Jan  4 14:21:21 EST 2008  Isaac Dupree <id@isaac.cedarswampstudios.org>
        * MyEither-->EitherString to allow Haskell98 instance
      
      Fri Jan  4 16:13:29 EST 2008  Isaac Dupree <id@isaac.cedarswampstudios.org>
        * re-portabilize Pretty, and corresponding changes
      
      Fri Jan  4 17:19:55 EST 2008  Isaac Dupree <id@isaac.cedarswampstudios.org>
        * Augment FastTypes to be much more complete
      
      Fri Jan  4 20:14:19 EST 2008  Isaac Dupree <id@isaac.cedarswampstudios.org>
        * use FastFunctions, cleanup FastString slightly
      
      Fri Jan  4 21:00:22 EST 2008  Isaac Dupree <id@isaac.cedarswampstudios.org>
        * Massive de-"#", mostly Int# --> FastInt (#1405)
      
      Fri Jan  4 21:02:49 EST 2008  Isaac Dupree <id@isaac.cedarswampstudios.org>
        * miscellaneous unnecessary-extension-removal
      
      Sat Jan  5 19:30:13 EST 2008  Isaac Dupree <id@isaac.cedarswampstudios.org>
        * add FastFunctions
      206b4dec
  12. 21 Sep, 2007 1 commit
  13. 17 Sep, 2007 3 commits
    • Ben.Lippmeier@anu.edu.au's avatar
      Tune coalescing in non-iterative register allocator · fff4dee0
      Ben.Lippmeier@anu.edu.au authored
      If iterative coalescing isn't turned on, then do a single aggressive
      coalescing pass for the first build/color cycle and then back off to 
      conservative coalescing for subseqent passes.
      
      Aggressive coalescing is a cheap way to eliminate lots of the reg-reg
      moves, but it can make the graph less colorable - if we turn it on 
      for every pass then allocation for code with a large amount of register
      pressure (ie SHA1) doesn't converge in a sensible number of cycles.
      fff4dee0
    • Ben.Lippmeier@anu.edu.au's avatar
      Bugfix to iterative coalescer · 6a347ffc
      Ben.Lippmeier@anu.edu.au authored
      Coalescences must be applied to the unsimplified graph in the same order 
      they were found by the iterative coalescing algorithm - otherwise
      the vreg rewrite mapping will be wrong and badness will ensue.
      6a347ffc
    • Ben.Lippmeier@anu.edu.au's avatar
      Add -dasm-lint · 1116b874
      Ben.Lippmeier@anu.edu.au authored
      When -dasm-lint is turned on the register conflict graph is checked for 
      internal consistency after each build/color pass. Make sure that all 
      edges point to valid nodes, that nodes are colored differently to their
      neighbours, and if the graph is supposed to be colored, that all nodes
      actually have a color.
      1116b874
  14. 13 Sep, 2007 2 commits
    • Ben.Lippmeier@anu.edu.au's avatar
      warning police · d5938fb2
      Ben.Lippmeier@anu.edu.au authored
      d5938fb2
    • Ben.Lippmeier@anu.edu.au's avatar
      Better calculation of spill costs / selection of spill candidates. · 220a12e9
      Ben.Lippmeier@anu.edu.au authored
      Use Chaitin's formula for calculation of spill costs.
        Cost to spill some vreg = (num writes + num reads) / degree of node
      
      With 2 extra provisos:
        1) Don't spill vregs that live for only 1 instruction.
        2) Always prefer to spill vregs that live for a number of instructions
             more than 10 times the number of vregs in that class.
      
      Proviso 2 is there to help deal with basic blocks containing very long
      live ranges - SHA1 has live ranges > 1700 instructions. We don't ever
      try to keep these long lived ranges in regs at the expense of others.
      
      Because stack slots are allocated from a global pool, and there is no
      slot coalescing yet, without this condition the allocation of SHA1 dosn't
      converge fast enough and eventually runs out of stack slots.
      
      Prior to this patch we were just choosing to spill the range with the
      longest lifetime, so we didn't bump into this particular problem.
      220a12e9
  15. 12 Sep, 2007 1 commit
  16. 11 Sep, 2007 2 commits
  17. 07 Sep, 2007 1 commit
    • Ben.Lippmeier@anu.edu.au's avatar
      Add iterative coalescing to graph coloring allocator · 12d0b388
      Ben.Lippmeier@anu.edu.au authored
      Iterative coalescing interleaves conservative coalesing with the regular
      simplify/scan passes. This increases the chance that nodes will be coalesced
      as they will have a lower degree than at the beginning of simplify. The end
      result is that more register to register moves will be eliminated in the
      output code, though the iterative nature of the algorithm makes it slower
      compared to non-iterative coloring.
      
      Use -fregs-iterative  for graph coloring allocation with iterative coalescing
          -fregs-graph      for non-iterative coalescing.
      
      The plan is for iterative coalescing to be enabled with -O2 and have a 
      quicker, non-iterative algorithm otherwise. The time/benefit tradeoff
      between iterative and not is still being tuned - optimal graph coloring
      is NP-hard, afterall..
      12d0b388
  18. 06 Sep, 2007 1 commit
    • Ben.Lippmeier@anu.edu.au's avatar
      Cure space leak in coloring register allocator · b7f448a4
      Ben.Lippmeier@anu.edu.au authored
      We now do a deep seq on the graph after it is 'built', but before coloring.
      Without this, the colorer will just force bits of it and the heap will
      fill up with half evaluated pieces of graph from previous build/spill
      stages and zillions of apply thunks.
      b7f448a4
  19. 05 Sep, 2007 3 commits
    • Ben.Lippmeier@anu.edu.au's avatar
      Improve GraphColor.colorScan · 1dd44153
      Ben.Lippmeier@anu.edu.au authored
      Testing whether a node in the conflict graph is trivially 
      colorable (triv) is still a somewhat expensive operation.
      
      When we find a triv node during scanning, even though we remove
      it and its edges from the graph, this is unlikely to to make the
      nodes we've just scanned become triv - so there's not much point
      re-scanning them right away.
      
      Scanning now takes place in passes. We scan the whole graph for
      triv nodes and remove all the ones found in a batch before rescanning
      old nodes.
      
      Register allocation for SHA1.lhs now takes (just) 40% of total
      compile time with -O2 -fregs-graph on x86
      1dd44153
    • Ben.Lippmeier@anu.edu.au's avatar
      Refactor MachRegs.trivColorable to do unboxed accumulation · a8312580
      Ben.Lippmeier@anu.edu.au authored
      trivColorable was soaking up total 31% time, 41% alloc when
      compiling SHA1.lhs with -O2 -fregs-graph on x86.
      
      Refactoring to use unboxed accumulators and walk directly
      over the UniqFM holding the set of conflicts reduces this 
      to 17% time, 6% alloc.
      a8312580
    • Ben.Lippmeier@anu.edu.au's avatar
      warning police · 272f0ba8
      Ben.Lippmeier@anu.edu.au authored
      272f0ba8
  20. 04 Sep, 2007 1 commit
  21. 03 Sep, 2007 2 commits
  22. 28 Aug, 2007 1 commit
  23. 01 Sep, 2007 1 commit
  24. 24 Aug, 2007 4 commits
  25. 23 Aug, 2007 2 commits
  26. 22 Aug, 2007 1 commit