Sink.hs 31.2 KB
Newer Older
1
{-# LANGUAGE GADTs #-}
2
module GHC.Cmm.Sink (
3
     cmmSink
4 5
  ) where

6 7
import GhcPrelude

8 9 10 11 12 13 14 15
import GHC.Cmm
import GHC.Cmm.Opt
import GHC.Cmm.Liveness
import GHC.Cmm.Utils
import GHC.Cmm.Dataflow.Block
import GHC.Cmm.Dataflow.Label
import GHC.Cmm.Dataflow.Collections
import GHC.Cmm.Dataflow.Graph
16
import GHC.Platform.Regs
John Ericson's avatar
John Ericson committed
17
import GHC.Platform (isARM, platformArch)
18

Sylvain Henry's avatar
Sylvain Henry committed
19
import GHC.Driver.Session
20
import Unique
21 22
import UniqFM

23
import qualified Data.IntSet as IntSet
24
import Data.List (partition)
25
import qualified Data.Set as Set
26
import Data.Maybe
27

28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
-- Compact sets for membership tests of local variables.

type LRegSet = IntSet.IntSet

emptyLRegSet :: LRegSet
emptyLRegSet = IntSet.empty

nullLRegSet :: LRegSet -> Bool
nullLRegSet = IntSet.null

insertLRegSet :: LocalReg -> LRegSet -> LRegSet
insertLRegSet l = IntSet.insert (getKey (getUnique l))

elemLRegSet :: LocalReg -> LRegSet -> Bool
elemLRegSet l = IntSet.member (getKey (getUnique l))

44
-- -----------------------------------------------------------------------------
Simon Marlow's avatar
Simon Marlow committed
45
-- Sinking and inlining
46 47 48 49

-- This is an optimisation pass that
--  (a) moves assignments closer to their uses, to reduce register pressure
--  (b) pushes assignments into a single branch of a conditional if possible
Simon Marlow's avatar
Simon Marlow committed
50 51 52 53 54 55
--  (c) inlines assignments to registers that are mentioned only once
--  (d) discards dead assignments
--
-- This tightens up lots of register-heavy code.  It is particularly
-- helpful in the Cmm generated by the Stg->Cmm code generator, in
-- which every function starts with a copyIn sequence like:
56 57 58 59 60 61 62 63 64 65 66
--
--    x1 = R1
--    x2 = Sp[8]
--    x3 = Sp[16]
--    if (Sp - 32 < SpLim) then L1 else L2
--
-- we really want to push the x1..x3 assignments into the L2 branch.
--
-- Algorithm:
--
--  * Start by doing liveness analysis.
Simon Marlow's avatar
Simon Marlow committed
67
--
Jan Stolarek's avatar
Jan Stolarek committed
68 69 70
--  * Keep a list of assignments A; earlier ones may refer to later ones.
--    Currently we only sink assignments to local registers, because we don't
--    have liveness information about global registers.
Simon Marlow's avatar
Simon Marlow committed
71 72
--
--  * Walk forwards through the graph, look at each node N:
Jan Stolarek's avatar
Jan Stolarek committed
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
--
--    * If it is a dead assignment, i.e. assignment to a register that is
--      not used after N, discard it.
--
--    * Try to inline based on current list of assignments
--      * If any assignments in A (1) occur only once in N, and (2) are
--        not live after N, inline the assignment and remove it
--        from A.
--
--      * If an assignment in A is cheap (RHS is local register), then
--        inline the assignment and keep it in A in case it is used afterwards.
--
--      * Otherwise don't inline.
--
--    * If N is assignment to a local register pick up the assignment
--      and add it to A.
--
--    * If N is not an assignment to a local register:
Simon Marlow's avatar
Simon Marlow committed
91
--      * remove any assignments from A that conflict with N, and
Jan Stolarek's avatar
Jan Stolarek committed
92 93 94
--        place them before N in the current block.  We call this
--        "dropping" the assignments.
--
Simon Marlow's avatar
Simon Marlow committed
95 96 97 98
--      * An assignment conflicts with N if it:
--        - assigns to a register mentioned in N
--        - mentions a register assigned by N
--        - reads from memory written by N
99
--      * do this recursively, dropping dependent assignments
Jan Stolarek's avatar
Jan Stolarek committed
100 101 102 103 104 105 106 107 108 109 110 111 112 113
--
--    * At an exit node:
--      * drop any assignments that are live on more than one successor
--        and are not trivial
--      * if any successor has more than one predecessor (a join-point),
--        drop everything live in that successor. Since we only propagate
--        assignments that are not dead at the successor, we will therefore
--        eliminate all assignments dead at this point. Thus analysis of a
--        join-point will always begin with an empty list of assignments.
--
--
-- As a result of above algorithm, sinking deletes some dead assignments
-- (transitively, even).  This isn't as good as removeDeadAssignments,
-- but it's much cheaper.
114

115 116
-- -----------------------------------------------------------------------------
-- things that we aren't optimising very well yet.
Simon Marlow's avatar
Simon Marlow committed
117
--
118 119
-- -----------
-- (1) From GHC's FastString.hashStr:
Simon Marlow's avatar
Simon Marlow committed
120 121 122 123 124 125 126 127 128 129 130 131 132 133 134
--
--  s2ay:
--      if ((_s2an::I64 == _s2ao::I64) >= 1) goto c2gn; else goto c2gp;
--  c2gn:
--      R1 = _s2au::I64;
--      call (I64[Sp])(R1) args: 8, res: 0, upd: 8;
--  c2gp:
--      _s2cO::I64 = %MO_S_Rem_W64(%MO_UU_Conv_W8_W64(I8[_s2aq::I64 + (_s2an::I64 << 0)]) + _s2au::I64 * 128,
--                                 4091);
--      _s2an::I64 = _s2an::I64 + 1;
--      _s2au::I64 = _s2cO::I64;
--      goto s2ay;
--
-- a nice loop, but we didn't eliminate the silly assignment at the end.
-- See Note [dependent assignments], which would probably fix this.
135
-- This is #8336.
Simon Marlow's avatar
Simon Marlow committed
136
--
137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156
-- -----------
-- (2) From stg_atomically_frame in PrimOps.cmm
--
-- We have a diamond control flow:
--
--     x = ...
--       |
--      / \
--     A   B
--      \ /
--       |
--    use of x
--
-- Now x won't be sunk down to its use, because we won't push it into
-- both branches of the conditional.  We certainly do have to check
-- that we can sink it past all the code in both A and B, but having
-- discovered that, we could sink it to its use.
--

-- -----------------------------------------------------------------------------
Simon Marlow's avatar
Simon Marlow committed
157

158
type Assignment = (LocalReg, CmmExpr, AbsMem)
Simon Marlow's avatar
Simon Marlow committed
159 160
  -- Assignment caches AbsMem, an abstraction of the memory read by
  -- the RHS of the assignment.
161

162
type Assignments = [Assignment]
Gabor Greif's avatar
Gabor Greif committed
163
  -- A sequence of assignments; kept in *reverse* order
164 165 166 167
  -- So the list [ x=e1, y=e2 ] means the sequence of assignments
  --     y = e2
  --     x = e1

168 169
cmmSink :: DynFlags -> CmmGraph -> CmmGraph
cmmSink dflags graph = ofBlockList (g_entry graph) $ sink mapEmpty $ blocks
170
  where
171
  liveness = cmmLocalLiveness dflags graph
Simon Marlow's avatar
Simon Marlow committed
172
  getLive l = mapFindWithDefault Set.empty l liveness
173

174
  blocks = revPostorder graph
175

Simon Marlow's avatar
Simon Marlow committed
176
  join_pts = findJoinPoints blocks
Simon Marlow's avatar
Simon Marlow committed
177

178
  sink :: LabelMap Assignments -> [CmmBlock] -> [CmmBlock]
179 180
  sink _ [] = []
  sink sunk (b:bs) =
Simon Marlow's avatar
Simon Marlow committed
181
    -- pprTrace "sink" (ppr lbl) $
Simon Marlow's avatar
Simon Marlow committed
182
    blockJoin first final_middle final_last : sink sunk' bs
183 184 185 186
    where
      lbl = entryLabel b
      (first, middle, last) = blockSplit b

Simon Marlow's avatar
Simon Marlow committed
187
      succs = successors last
188

Simon Marlow's avatar
Simon Marlow committed
189 190 191 192
      -- Annotate the middle nodes with the registers live *after*
      -- the node.  This will help us decide whether we can inline
      -- an assignment in the current node or not.
      live = Set.unions (map getLive succs)
193 194
      live_middle = gen_kill dflags last live
      ann_middles = annotate dflags live_middle (blockToList middle)
195

Simon Marlow's avatar
Simon Marlow committed
196
      -- Now sink and inline in this block
197
      (middle', assigs) = walk dflags ann_middles (mapFindWithDefault [] lbl sunk)
198
      fold_last = constantFoldNode dflags last
199
      (final_last, assigs') = tryToInline dflags live fold_last assigs
Simon Marlow's avatar
Simon Marlow committed
200

Simon Marlow's avatar
Simon Marlow committed
201 202 203
      -- We cannot sink into join points (successors with more than
      -- one predecessor), so identify the join points and the set
      -- of registers live in them.
204
      (joins, nonjoins) = partition (`mapMember` join_pts) succs
Simon Marlow's avatar
Simon Marlow committed
205 206 207 208 209 210 211 212 213 214 215 216 217 218
      live_in_joins = Set.unions (map getLive joins)

      -- We do not want to sink an assignment into multiple branches,
      -- so identify the set of registers live in multiple successors.
      -- This is made more complicated because when we sink an assignment
      -- into one branch, this might change the set of registers that are
      -- now live in multiple branches.
      init_live_sets = map getLive nonjoins
      live_in_multi live_sets r =
         case filter (Set.member r) live_sets of
           (_one:_two:_) -> True
           _ -> False

      -- Now, drop any assignments that we will not sink any further.
219
      (dropped_last, assigs'') = dropAssignments dflags drop_if init_live_sets assigs'
Simon Marlow's avatar
Simon Marlow committed
220 221 222

      drop_if a@(r,rhs,_) live_sets = (should_drop, live_sets')
          where
223
            should_drop =  conflicts dflags a final_last
224
                        || not (isTrivial dflags rhs) && live_in_multi live_sets r
Simon Marlow's avatar
Simon Marlow committed
225
                        || r `Set.member` live_in_joins
226

Simon Marlow's avatar
Simon Marlow committed
227 228
            live_sets' | should_drop = live_sets
                       | otherwise   = map upd live_sets
229

Simon Marlow's avatar
Simon Marlow committed
230 231
            upd set | r `Set.member` set = set `Set.union` live_rhs
                    | otherwise          = set
232

233
            live_rhs = foldRegsUsed dflags extendRegSet emptyRegSet rhs
234

235
      final_middle = foldl' blockSnoc middle' dropped_last
236 237

      sunk' = mapUnion sunk $
238
                 mapFromList [ (l, filterAssignments dflags (getLive l) assigs'')
239 240
                             | l <- succs ]

Simon Marlow's avatar
Simon Marlow committed
241 242 243 244 245
{- TODO: enable this later, when we have some good tests in place to
   measure the effect and tune it.

-- small: an expression we don't mind duplicating
isSmall :: CmmExpr -> Bool
246
isSmall (CmmReg (CmmLocal _)) = True  --
Simon Marlow's avatar
Simon Marlow committed
247 248 249 250
isSmall (CmmLit _) = True
isSmall (CmmMachOp (MO_Add _) [x,y]) = isTrivial x && isTrivial y
isSmall (CmmRegOff (CmmLocal _) _) = True
isSmall _ = False
251
-}
Simon Marlow's avatar
Simon Marlow committed
252

Jan Stolarek's avatar
Jan Stolarek committed
253 254 255 256
--
-- We allow duplication of trivial expressions: registers (both local and
-- global) and literals.
--
257 258 259
isTrivial :: DynFlags -> CmmExpr -> Bool
isTrivial _ (CmmReg (CmmLocal _)) = True
isTrivial dflags (CmmReg (CmmGlobal r)) = -- see Note [Inline GlobalRegs?]
Moritz Angermann's avatar
Moritz Angermann committed
260 261 262 263
  if isARM (platformArch (targetPlatform dflags))
  then True -- CodeGen.Platform.ARM does not have globalRegMaybe
  else isJust (globalRegMaybe (targetPlatform dflags) r)
  -- GlobalRegs that are loads from BaseReg are not trivial
264 265
isTrivial _ (CmmLit _) = True
isTrivial _ _          = False
266

Simon Marlow's avatar
Simon Marlow committed
267
--
268
-- annotate each node with the set of registers live *after* the node
Simon Marlow's avatar
Simon Marlow committed
269
--
270 271 272
annotate :: DynFlags -> LocalRegSet -> [CmmNode O O] -> [(LocalRegSet, CmmNode O O)]
annotate dflags live nodes = snd $ foldr ann (live,[]) nodes
  where ann n (live,nodes) = (gen_kill dflags n live, (live,n) : nodes)
Simon Marlow's avatar
Simon Marlow committed
273 274 275 276

--
-- Find the blocks that have multiple successors (join points)
--
277
findJoinPoints :: [CmmBlock] -> LabelMap Int
Simon Marlow's avatar
Simon Marlow committed
278 279 280 281
findJoinPoints blocks = mapFilter (>1) succ_counts
 where
  all_succs = concatMap successors blocks

282
  succ_counts :: LabelMap Int
Simon Marlow's avatar
Simon Marlow committed
283
  succ_counts = foldr (\l -> mapInsertWith (+) l 1) mapEmpty all_succs
284

Simon Marlow's avatar
Simon Marlow committed
285 286 287 288
--
-- filter the list of assignments to remove any assignments that
-- are not live in a continuation.
--
289
filterAssignments :: DynFlags -> LocalRegSet -> Assignments -> Assignments
290
filterAssignments dflags live assigs = reverse (go assigs [])
Simon Marlow's avatar
Simon Marlow committed
291
  where go []             kept = kept
292 293 294
        go (a@(r,_,_):as) kept | needed    = go as (a:kept)
                               | otherwise = go as kept
           where
Simon Marlow's avatar
Simon Marlow committed
295
              needed = r `Set.member` live
296
                       || any (conflicts dflags a) (map toNode kept)
Simon Marlow's avatar
Simon Marlow committed
297 298 299
                       --  Note that we must keep assignments that are
                       -- referred to by other assignments we have
                       -- already kept.
300

Simon Marlow's avatar
Simon Marlow committed
301 302 303
-- -----------------------------------------------------------------------------
-- Walk through the nodes of a block, sinking and inlining assignments
-- as we go.
304 305 306 307 308 309 310 311 312 313
--
-- On input we pass in a:
--    * list of nodes in the block
--    * a list of assignments that appeared *before* this block and
--      that are being sunk.
--
-- On output we get:
--    * a new block
--    * a list of assignments that will be placed *after* that block.
--
Simon Marlow's avatar
Simon Marlow committed
314

315
walk :: DynFlags
316
     -> [(LocalRegSet, CmmNode O O)]    -- nodes of the block, annotated with
Simon Marlow's avatar
Simon Marlow committed
317 318
                                        -- the set of registers live *after*
                                        -- this node.
Simon Marlow's avatar
Simon Marlow committed
319

320
     -> Assignments                     -- The current list of
Simon Marlow's avatar
Simon Marlow committed
321
                                        -- assignments we are sinking.
322 323
                                        -- Earlier assignments may refer
                                        -- to later ones.
324

Simon Marlow's avatar
Simon Marlow committed
325
     -> ( Block CmmNode O O             -- The new block
326
        , Assignments                   -- Assignments to sink further
Simon Marlow's avatar
Simon Marlow committed
327
        )
328

329
walk dflags nodes assigs = go nodes emptyBlock assigs
Simon Marlow's avatar
Simon Marlow committed
330 331 332
 where
   go []               block as = (block, as)
   go ((live,node):ns) block as
Simon Marlow's avatar
Simon Marlow committed
333 334
    | shouldDiscard node live           = go ns block as
       -- discard dead assignment
335
    | Just a <- shouldSink dflags node2 = go ns block (a : as1)
336
    | otherwise                         = go ns block' as'
Simon Marlow's avatar
Simon Marlow committed
337
    where
338
      node1 = constantFoldNode dflags node
339 340 341 342 343 344

      (node2, as1) = tryToInline dflags live node1 as

      (dropped, as') = dropAssignmentsSimple dflags
                          (\a -> conflicts dflags a node2) as1

345
      block' = foldl' blockSnoc block dropped `blockSnoc` node2
346

Simon Marlow's avatar
Simon Marlow committed
347 348 349 350 351 352 353

--
-- Heuristic to decide whether to pick up and sink an assignment
-- Currently we pick up all assignments to local registers.  It might
-- be profitable to sink assignments to global regs too, but the
-- liveness analysis doesn't track those (yet) so we can't.
--
354 355
shouldSink :: DynFlags -> CmmNode e x -> Maybe Assignment
shouldSink dflags (CmmAssign (CmmLocal r) e) | no_local_regs = Just (r, e, exprMem dflags e)
Simon Marlow's avatar
Simon Marlow committed
356
  where no_local_regs = True -- foldRegsUsed (\_ _ -> False) True e
357
shouldSink _ _other = Nothing
358

Simon Marlow's avatar
Simon Marlow committed
359 360
--
-- discard dead assignments.  This doesn't do as good a job as
361
-- removeDeadAssignments, because it would need multiple passes
Simon Marlow's avatar
Simon Marlow committed
362 363 364 365 366 367 368 369
-- to get all the dead code, but it catches the common case of
-- superfluous reloads from the stack that the stack allocator
-- leaves behind.
--
-- Also we catch "r = r" here.  You might think it would fall
-- out of inlining, but the inliner will see that r is live
-- after the instruction and choose not to inline r in the rhs.
--
370
shouldDiscard :: CmmNode e x -> LocalRegSet -> Bool
Simon Marlow's avatar
Simon Marlow committed
371 372 373 374 375
shouldDiscard node live
   = case node of
       CmmAssign r (CmmReg r') | r == r' -> True
       CmmAssign (CmmLocal r) _ -> not (r `Set.member` live)
       _otherwise -> False
Jan Stolarek's avatar
Jan Stolarek committed
376

Simon Marlow's avatar
Simon Marlow committed
377

378 379 380
toNode :: Assignment -> CmmNode O O
toNode (r,rhs,_) = CmmAssign (CmmLocal r) rhs

381 382
dropAssignmentsSimple :: DynFlags -> (Assignment -> Bool) -> Assignments
                      -> ([CmmNode O O], Assignments)
383
dropAssignmentsSimple dflags f = dropAssignments dflags (\a _ -> (f a, ())) ()
Simon Marlow's avatar
Simon Marlow committed
384

385 386
dropAssignments :: DynFlags -> (Assignment -> s -> (Bool, s)) -> s -> Assignments
                -> ([CmmNode O O], Assignments)
387
dropAssignments dflags should_drop state assigs
388 389
 = (dropped, reverse kept)
 where
Simon Marlow's avatar
Simon Marlow committed
390
   (dropped,kept) = go state assigs [] []
391

Simon Marlow's avatar
Simon Marlow committed
392 393 394 395
   go _ []             dropped kept = (dropped, kept)
   go state (assig : rest) dropped kept
      | conflict  = go state' rest (toNode assig : dropped) kept
      | otherwise = go state' rest dropped (assig:kept)
396
      where
Simon Marlow's avatar
Simon Marlow committed
397
        (dropit, state') = should_drop assig state
398
        conflict = dropit || any (conflicts dflags assig) dropped
Simon Marlow's avatar
Simon Marlow committed
399

Simon Marlow's avatar
Simon Marlow committed
400 401 402

-- -----------------------------------------------------------------------------
-- Try to inline assignments into a node.
403 404
-- This also does constant folding for primpops, since
-- inlining opens up opportunities for doing so.
Simon Marlow's avatar
Simon Marlow committed
405 406

tryToInline
407
   :: DynFlags
408
   -> LocalRegSet               -- set of registers live after this
Simon Marlow's avatar
Simon Marlow committed
409 410 411 412
                                -- node.  We cannot inline anything
                                -- that is live after the node, unless
                                -- it is small enough to duplicate.
   -> CmmNode O x               -- The node to inline into
413
   -> Assignments               -- Assignments to inline
Simon Marlow's avatar
Simon Marlow committed
414 415
   -> (
        CmmNode O x             -- New node
416
      , Assignments             -- Remaining assignments
Simon Marlow's avatar
Simon Marlow committed
417 418
      )

419
tryToInline dflags live node assigs = go usages node emptyLRegSet assigs
Simon Marlow's avatar
Simon Marlow committed
420
 where
421 422
  usages :: UniqFM Int -- Maps each LocalReg to a count of how often it is used
  usages = foldLocalRegsUsed dflags addUsage emptyUFM node
Simon Marlow's avatar
Simon Marlow committed
423

Simon Marlow's avatar
Simon Marlow committed
424
  go _usages node _skipped [] = (node, [])
Simon Marlow's avatar
Simon Marlow committed
425

Simon Marlow's avatar
Simon Marlow committed
426
  go usages node skipped (a@(l,rhs,_) : rest)
427
   | cannot_inline           = dont_inline
428
   | occurs_none             = discard  -- Note [discard during inlining]
429
   | occurs_once             = inline_and_discard
430
   | isTrivial dflags rhs    = inline_and_keep
431
   | otherwise               = dont_inline
Simon Marlow's avatar
Simon Marlow committed
432
   where
433
        inline_and_discard = go usages' inl_node skipped rest
434
          where usages' = foldLocalRegsUsed dflags addUsage usages rhs
Simon Marlow's avatar
Simon Marlow committed
435

436 437
        discard = go usages node skipped rest

Simon Marlow's avatar
Simon Marlow committed
438 439
        dont_inline        = keep node  -- don't inline the assignment, keep it
        inline_and_keep    = keep inl_node -- inline the assignment, keep it
Simon Marlow's avatar
Simon Marlow committed
440

441
        keep node' = (final_node, a : rest')
442
          where (final_node, rest') = go usages' node' (insertLRegSet l skipped) rest
Simon Marlow's avatar
Simon Marlow committed
443 444
                usages' = foldLocalRegsUsed dflags (\m r -> addToUFM m r 2)
                                            usages rhs
445 446 447
                -- we must not inline anything that is mentioned in the RHS
                -- of a binding that we have already skipped, so we set the
                -- usages of the regs on the RHS to 2.
Simon Marlow's avatar
Simon Marlow committed
448

449
        cannot_inline = skipped `regsUsedIn` rhs -- Note [dependent assignments]
450
                        || l `elemLRegSet` skipped
451
                        || not (okToInline dflags rhs node)
Simon Marlow's avatar
Simon Marlow committed
452

453 454 455 456 457
        l_usages = lookupUFM usages l
        l_live   = l `elemRegSet` live

        occurs_once = not l_live && l_usages == Just 1
        occurs_none = not l_live && l_usages == Nothing
458

459
        inl_node = improveConditional (mapExpDeep inl_exp node)
460

461 462 463 464
        inl_exp :: CmmExpr -> CmmExpr
        -- inl_exp is where the inlining actually takes place!
        inl_exp (CmmReg    (CmmLocal l'))     | l == l' = rhs
        inl_exp (CmmRegOff (CmmLocal l') off) | l == l'
465
                    = cmmOffset dflags rhs off
466
                    -- re-constant fold after inlining
467 468 469 470
        inl_exp (CmmMachOp op args) = cmmMachOpFold dflags op args
        inl_exp other = other


471 472 473 474 475 476 477 478
{- Note [improveConditional]

cmmMachOpFold tries to simplify conditionals to turn things like
  (a == b) != 1
into
  (a != b)
but there's one case it can't handle: when the comparison is over
floating-point values, we can't invert it, because floating-point
Gabor Greif's avatar
Gabor Greif committed
479
comparisons aren't invertible (because of NaNs).
480 481 482

But we *can* optimise this conditional by swapping the true and false
branches. Given
483
  CmmCondBranch ((a >## b) != 1) t f
484
we can turn it into
485 486
  CmmCondBranch (a >## b) f t

487 488 489 490 491 492
So here we catch conditionals that weren't optimised by cmmMachOpFold,
and apply above transformation to eliminate the comparison against 1.

It's tempting to just turn every != into == and then let cmmMachOpFold
do its thing, but that risks changing a nice fall-through conditional
into one that requires two jumps. (see swapcond_last in
493
GHC.Cmm.ContFlowOpt), so instead we carefully look for just the cases where
494
we can eliminate a comparison.
495
-}
496 497 498 499 500 501 502 503 504 505 506
improveConditional :: CmmNode O x -> CmmNode O x
improveConditional
  (CmmCondBranch (CmmMachOp mop [x, CmmLit (CmmInt 1 _)]) t f l)
  | neLike mop, isComparisonExpr x
  = CmmCondBranch x f t (fmap not l)
  where
    neLike (MO_Ne _) = True
    neLike (MO_U_Lt _) = True   -- (x<y) < 1 behaves like (x<y) != 1
    neLike (MO_S_Lt _) = True   -- (x<y) < 1 behaves like (x<y) != 1
    neLike _ = False
improveConditional other = other
Simon Marlow's avatar
Simon Marlow committed
507

Simon Marlow's avatar
Simon Marlow committed
508
-- Note [dependent assignments]
Jan Stolarek's avatar
Jan Stolarek committed
509
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Simon Marlow's avatar
Simon Marlow committed
510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526
--
-- If our assignment list looks like
--
--    [ y = e,  x = ... y ... ]
--
-- We cannot inline x.  Remember this list is really in reverse order,
-- so it means  x = ... y ...; y = e
--
-- Hence if we inline x, the outer assignment to y will capture the
-- reference in x's right hand side.
--
-- In this case we should rename the y in x's right-hand side,
-- i.e. change the list to [ y = e, x = ... y1 ..., y1 = y ]
-- Now we can go ahead and inline x.
--
-- For now we do nothing, because this would require putting
-- everything inside UniqSM.
527 528 529 530 531 532 533 534 535
--
-- One more variant of this (#7366):
--
--   [ y = e, y = z ]
--
-- If we don't want to inline y = e, because y is used many times, we
-- might still be tempted to inline y = z (because we always inline
-- trivial rhs's).  But of course we can't, because y is equal to e,
-- not z.
Simon Marlow's avatar
Simon Marlow committed
536

537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552
-- Note [discard during inlining]
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-- Opportunities to discard assignments sometimes appear after we've
-- done some inlining.  Here's an example:
--
--      x = R1;
--      y = P64[x + 7];
--      z = P64[x + 15];
--      /* z is dead */
--      R1 = y & (-8);
--
-- The x assignment is trivial, so we inline it in the RHS of y, and
-- keep both x and y.  z gets dropped because it is dead, then we
-- inline y, and we have a dead assignment to x.  If we don't notice
-- that x is dead in tryToInline, we end up retaining it.

Simon Marlow's avatar
Simon Marlow committed
553 554 555
addUsage :: UniqFM Int -> LocalReg -> UniqFM Int
addUsage m r = addToUFM_C (+) m r 1

556 557
regsUsedIn :: LRegSet -> CmmExpr -> Bool
regsUsedIn ls _ | nullLRegSet ls = False
Simon Marlow's avatar
Simon Marlow committed
558
regsUsedIn ls e = wrapRecExpf f e False
559 560
  where f (CmmReg (CmmLocal l))      _ | l `elemLRegSet` ls = True
        f (CmmRegOff (CmmLocal l) _) _ | l `elemLRegSet` ls = True
Simon Marlow's avatar
Simon Marlow committed
561 562 563 564 565 566 567
        f _ z = z

-- we don't inline into CmmUnsafeForeignCall if the expression refers
-- to global registers.  This is a HACK to avoid global registers
-- clashing with C argument-passing registers, really the back-end
-- ought to be able to handle it properly, but currently neither PprC
-- nor the NCG can do it.  See Note [Register parameter passing]
568
-- See also GHC.StgToCmm.Foreign.load_args_into_temps.
569
okToInline :: DynFlags -> CmmExpr -> CmmNode e x -> Bool
570 571
okToInline dflags expr node@(CmmUnsafeForeignCall{}) =
    not (globalRegistersConflict dflags expr node)
572
okToInline _ _ _ = True
Simon Marlow's avatar
Simon Marlow committed
573

Simon Marlow's avatar
Simon Marlow committed
574
-- -----------------------------------------------------------------------------
575

576 577
-- | @conflicts (r,e) node@ is @False@ if and only if the assignment
-- @r = e@ can be safely commuted past statement @node@.
578 579
conflicts :: DynFlags -> Assignment -> CmmNode O x -> Bool
conflicts dflags (r, rhs, addr) node
580

581
  -- (1) node defines registers used by rhs of assignment. This catches
Gabor Greif's avatar
Gabor Greif committed
582
  -- assignments and all three kinds of calls. See Note [Sinking and calls]
583 584 585 586
  | globalRegistersConflict dflags rhs node                       = True
  | localRegistersConflict  dflags rhs node                       = True

  -- (2) node uses register defined by assignment
587
  | foldRegsUsed dflags (\b r' -> r == r' || b) False node        = True
588

589
  -- (3) a store to an address conflicts with a read of the same memory
590 591
  | CmmStore addr' e <- node
  , memConflicts addr (loadAddr dflags addr' (cmmExprWidth dflags e)) = True
592

593
  -- (4) an assignment to Hp/Sp conflicts with a heap/stack read respectively
594 595 596 597
  | HeapMem    <- addr, CmmAssign (CmmGlobal Hp) _ <- node        = True
  | StackMem   <- addr, CmmAssign (CmmGlobal Sp) _ <- node        = True
  | SpMem{}    <- addr, CmmAssign (CmmGlobal Sp) _ <- node        = True

598
  -- (5) foreign calls clobber heap: see Note [Foreign calls clobber heap]
599
  | CmmUnsafeForeignCall{} <- node, memConflicts addr AnyMem      = True
600

601 602 603 604
  -- (6) native calls clobber any memory
  | CmmCall{} <- node, memConflicts addr AnyMem                   = True

  -- (7) otherwise, no conflict
605 606
  | otherwise = False

607 608 609 610
-- Returns True if node defines any global registers that are used in the
-- Cmm expression
globalRegistersConflict :: DynFlags -> CmmExpr -> CmmNode e x -> Bool
globalRegistersConflict dflags expr node =
611 612
    foldRegsDefd dflags (\b r -> b || regUsedIn dflags (CmmGlobal r) expr)
                 False node
613 614 615 616 617

-- Returns True if node defines any local registers that are used in the
-- Cmm expression
localRegistersConflict :: DynFlags -> CmmExpr -> CmmNode e x -> Bool
localRegistersConflict dflags expr node =
618 619
    foldRegsDefd dflags (\b r -> b || regUsedIn dflags (CmmLocal  r) expr)
                 False node
620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659

-- Note [Sinking and calls]
-- ~~~~~~~~~~~~~~~~~~~~~~~~
--
-- We have three kinds of calls: normal (CmmCall), safe foreign (CmmForeignCall)
-- and unsafe foreign (CmmUnsafeForeignCall). We perform sinking pass after
-- stack layout (see Note [Sinking after stack layout]) which leads to two
-- invariants related to calls:
--
--   a) during stack layout phase all safe foreign calls are turned into
--      unsafe foreign calls (see Note [Lower safe foreign calls]). This
--      means that we will never encounter CmmForeignCall node when running
--      sinking after stack layout
--
--   b) stack layout saves all variables live across a call on the stack
--      just before making a call (remember we are not sinking assignments to
--      stack):
--
--       L1:
--          x = R1
--          P64[Sp - 16] = L2
--          P64[Sp - 8]  = x
--          Sp = Sp - 16
--          call f() returns L2
--       L2:
--
--      We will attempt to sink { x = R1 } but we will detect conflict with
--      { P64[Sp - 8]  = x } and hence we will drop { x = R1 } without even
--      checking whether it conflicts with { call f() }. In this way we will
--      never need to check any assignment conflicts with CmmCall. Remember
--      that we still need to check for potential memory conflicts.
--
-- So the result is that we only need to worry about CmmUnsafeForeignCall nodes
-- when checking conflicts (see Note [Unsafe foreign calls clobber caller-save registers]).
-- This assumption holds only when we do sinking after stack layout. If we run
-- it before stack layout we need to check for possible conflicts with all three
-- kinds of calls. Our `conflicts` function does that by using a generic
-- foldRegsDefd and foldRegsUsed functions defined in DefinerOfRegs and
-- UserOfRegs typeclasses.
--
660

661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678
-- An abstraction of memory read or written.
data AbsMem
  = NoMem            -- no memory accessed
  | AnyMem           -- arbitrary memory
  | HeapMem          -- definitely heap memory
  | StackMem         -- definitely stack memory
  | SpMem            -- <size>[Sp+n]
       {-# UNPACK #-} !Int
       {-# UNPACK #-} !Int

-- Having SpMem is important because it lets us float loads from Sp
-- past stores to Sp as long as they don't overlap, and this helps to
-- unravel some long sequences of
--    x1 = [Sp + 8]
--    x2 = [Sp + 16]
--    ...
--    [Sp + 8]  = xi
--    [Sp + 16] = xj
679
--
680 681 682
-- Note that SpMem is invalidated if Sp is changed, but the definition
-- of 'conflicts' above handles that.

Simon Marlow's avatar
Simon Marlow committed
683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703
-- ToDo: this won't currently fix the following commonly occurring code:
--    x1 = [R1 + 8]
--    x2 = [R1 + 16]
--    ..
--    [Hp - 8] = x1
--    [Hp - 16] = x2
--    ..

-- because [R1 + 8] and [Hp - 8] are both HeapMem.  We know that
-- assignments to [Hp + n] do not conflict with any other heap memory,
-- but this is tricky to nail down.  What if we had
--
--   x = Hp + n
--   [x] = ...
--
--  the store to [x] should be "new heap", not "old heap".
--  Furthermore, you could imagine that if we started inlining
--  functions in Cmm then there might well be reads of heap memory
--  that was written in the same basic block.  To take advantage of
--  non-aliasing of heap memory we will have to be more clever.

704 705
-- Note [Foreign calls clobber heap]
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
706 707 708 709 710 711 712 713 714 715 716 717
--
-- It is tempting to say that foreign calls clobber only
-- non-heap/stack memory, but unfortunately we break this invariant in
-- the RTS.  For example, in stg_catch_retry_frame we call
-- stmCommitNestedTransaction() which modifies the contents of the
-- TRec it is passed (this actually caused incorrect code to be
-- generated).
--
-- Since the invariant is true for the majority of foreign calls,
-- perhaps we ought to have a special annotation for calls that can
-- modify heap/stack memory.  For now we just use the conservative
-- definition here.
718 719 720 721
--
-- Some CallishMachOp imply a memory barrier e.g. AtomicRMW and
-- therefore we should never float any memory operations across one of
-- these calls.
722 723


724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747
bothMems :: AbsMem -> AbsMem -> AbsMem
bothMems NoMem    x         = x
bothMems x        NoMem     = x
bothMems HeapMem  HeapMem   = HeapMem
bothMems StackMem StackMem     = StackMem
bothMems (SpMem o1 w1) (SpMem o2 w2)
  | o1 == o2  = SpMem o1 (max w1 w2)
  | otherwise = StackMem
bothMems SpMem{}  StackMem  = StackMem
bothMems StackMem SpMem{}   = StackMem
bothMems _         _        = AnyMem

memConflicts :: AbsMem -> AbsMem -> Bool
memConflicts NoMem      _          = False
memConflicts _          NoMem      = False
memConflicts HeapMem    StackMem   = False
memConflicts StackMem   HeapMem    = False
memConflicts SpMem{}    HeapMem    = False
memConflicts HeapMem    SpMem{}    = False
memConflicts (SpMem o1 w1) (SpMem o2 w2)
  | o1 < o2   = o1 + w1 > o2
  | otherwise = o2 + w2 > o1
memConflicts _         _         = True

748 749 750 751
exprMem :: DynFlags -> CmmExpr -> AbsMem
exprMem dflags (CmmLoad addr w)  = bothMems (loadAddr dflags addr (typeWidth w)) (exprMem dflags addr)
exprMem dflags (CmmMachOp _ es)  = foldr bothMems NoMem (map (exprMem dflags) es)
exprMem _      _                 = NoMem
752

753 754
loadAddr :: DynFlags -> CmmExpr -> Width -> AbsMem
loadAddr dflags e w =
755
  case e of
756 757
   CmmReg r       -> regAddr dflags r 0 w
   CmmRegOff r i  -> regAddr dflags r i w
758
   _other | regUsedIn dflags spReg e -> StackMem
759 760
          | otherwise -> AnyMem

761 762 763
regAddr :: DynFlags -> CmmReg -> Int -> Width -> AbsMem
regAddr _      (CmmGlobal Sp) i w = SpMem i (widthInBytes w)
regAddr _      (CmmGlobal Hp) _ _ = HeapMem
764
regAddr _      (CmmGlobal CurrentTSO) _ _ = HeapMem -- important for PrimOps
765 766
regAddr dflags r _ _ | isGcPtrType (cmmRegType dflags r) = HeapMem -- yay! GCPtr pays for itself
regAddr _      _ _ _ = AnyMem
767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854

{-
Note [Inline GlobalRegs?]

Should we freely inline GlobalRegs?

Actually it doesn't make a huge amount of difference either way, so we
*do* currently treat GlobalRegs as "trivial" and inline them
everywhere, but for what it's worth, here is what I discovered when I
(SimonM) looked into this:

Common sense says we should not inline GlobalRegs, because when we
have

  x = R1

the register allocator will coalesce this assignment, generating no
code, and simply record the fact that x is bound to $rbx (or
whatever).  Furthermore, if we were to sink this assignment, then the
range of code over which R1 is live increases, and the range of code
over which x is live decreases.  All things being equal, it is better
for x to be live than R1, because R1 is a fixed register whereas x can
live in any register.  So we should neither sink nor inline 'x = R1'.

However, not inlining GlobalRegs can have surprising
consequences. e.g. (cgrun020)

  c3EN:
      _s3DB::P64 = R1;
      _c3ES::P64 = _s3DB::P64 & 7;
      if (_c3ES::P64 >= 2) goto c3EU; else goto c3EV;
  c3EU:
      _s3DD::P64 = P64[_s3DB::P64 + 6];
      _s3DE::P64 = P64[_s3DB::P64 + 14];
      I64[Sp - 8] = c3F0;
      R1 = _s3DE::P64;
      P64[Sp] = _s3DD::P64;

inlining the GlobalReg gives:

  c3EN:
      if (R1 & 7 >= 2) goto c3EU; else goto c3EV;
  c3EU:
      I64[Sp - 8] = c3F0;
      _s3DD::P64 = P64[R1 + 6];
      R1 = P64[R1 + 14];
      P64[Sp] = _s3DD::P64;

but if we don't inline the GlobalReg, instead we get:

      _s3DB::P64 = R1;
      if (_s3DB::P64 & 7 >= 2) goto c3EU; else goto c3EV;
  c3EU:
      I64[Sp - 8] = c3F0;
      R1 = P64[_s3DB::P64 + 14];
      P64[Sp] = P64[_s3DB::P64 + 6];

This looks better - we managed to inline _s3DD - but in fact it
generates an extra reg-reg move:

.Lc3EU:
        movq $c3F0_info,-8(%rbp)
        movq %rbx,%rax
        movq 14(%rbx),%rbx
        movq 6(%rax),%rax
        movq %rax,(%rbp)

because _s3DB is now live across the R1 assignment, we lost the
benefit of coalescing.

Who is at fault here?  Perhaps if we knew that _s3DB was an alias for
R1, then we would not sink a reference to _s3DB past the R1
assignment.  Or perhaps we *should* do that - we might gain by sinking
it, despite losing the coalescing opportunity.

Sometimes not inlining global registers wins by virtue of the rule
about not inlining into arguments of a foreign call, e.g. (T7163) this
is what happens when we inlined F1:

      _s3L2::F32 = F1;
      _c3O3::F32 = %MO_F_Mul_W32(F1, 10.0 :: W32);
      (_s3L7::F32) = call "ccall" arg hints:  []  result hints:  [] rintFloat(_c3O3::F32);

but if we don't inline F1:

      (_s3L7::F32) = call "ccall" arg hints:  []  result hints:  [] rintFloat(%MO_F_Mul_W32(_s3L2::F32,
                                                                                            10.0 :: W32));
-}