Add two bangs to improve perf of flattening

This tiny patch improves the compile time of flatten-heavy
programs by 1-2%, by adding two bangs.

Addresses (somewhat) #18502

This reduces allocation by
   T9872b   -1.1%
   T9872d   -3.3%

   T5321Fun -0.2%
   T5631    -0.2%
   T5837    +0.1%
   T6048    +0.1%

Metric Decrease:
......@@ -1891,7 +1891,9 @@ substForAllCoBndrUsingLC sym sco (LC subst lc_env) tv co
-- For the inverse operation, see 'liftCoMatch'
ty_co_subst :: LiftingContext -> Role -> Type -> Coercion
ty_co_subst lc role ty
ty_co_subst !lc role ty
-- !lc: making this function strict in lc allows callers to
-- pass its two components separately, rather than boxing them
= go role ty
go :: Role -> Type -> Coercion
......@@ -2864,9 +2866,9 @@ simplifyArgsWorker orig_ki_binders orig_inner_ki orig_fvs
-- need a coercion (kind_co :: old_kind ~ new_kind).
-- The bangs here have been observed to improve performance
-- significantly in optimized builds.
let kind_co = mkSymCo $
liftCoSubst Nominal lc (tyCoBinderType binder)
-- significantly in optimized builds; see #18502
let !kind_co = mkSymCo $
liftCoSubst Nominal lc (tyCoBinderType binder)
!casted_xi = xi `mkCastTy` kind_co
casted_co = mkCoherenceLeftCo role xi kind_co co
