Commit 4070b105 authored by sewardj's avatar sewardj

[project @ 2000-02-28 12:02:31 by sewardj]

Many changes to improve the quality and correctness of generated code,
both for x86 and all-platforms.  The intent is that the x86 NCG will
now be good enough for general use.

-- Add an almost-trivial Stix (generic) peephole optimiser, whose sole
   purpose is elide assignments to temporaries used only once, in the
   very next tree.  This generates substantially better code for
   conditionals on all platforms.  Enhance Stix constant folding to
   take advantage of the inlining.

   The inlining presents subsequent insn selection phases with more
   complex trees than would have previously been used to.  This has
   shown up several bugs in the x86 insn selectors, now fixed.
   (assumptions that data size is Word, when could be Byte,
    assumptions that an operand will always be in a temp reg, etc)

-- x86: Use the FLDZ and FLD1 insns.

-- x86: spill FP registers with 80-bit loads/stores so that
   Intel's extra 16 bits of accuracy are not lost.  If this isn't
   done, FP spills are not suitably transparent.  Increase the
   number of spill words available to 2048.

-- x86: give the register allocator more flexibility in choosing
   spill temporaries.

-- x86, RegAllocInfo.regUsage: fix error for GST, and rewrite to
   make it clearer.

-- Correctly track movements in the C stack pointer, and generate
   correct spill code for archs which spill against the stack pointer
   even when the stack pointer moves.  Redo the x86 ccall mechanism
   to push args on the C stack in the normal way.  Rather than have
   the spiller have to analyse code sequences to determine the current
   stack offset, the insn selectors communicate the current offset
   whenever it changes by inserting a DELTA pseudo-insn.  Then the
   spiller only has to spot DELTAs.

   This means having a new native-code-generator monad (Stix.NatM)
   which carries both a UniqSupply and the current stack offset.

-- Remove the asmPar/asmSeq ways of grouping insns together.
   In the presence of fixed registers, it is hard to demonstrate
   that insn selectors using asmPar always give correct code, and
   the extra complication doesn't help any.

   Also, directly construct code sequences using tree-based ordered
   lists (utils/OrdList.lhs) for linear-time appends, rather than
   the bizarrely complex method using fns and fn composition.

-- Inline some hcats in printing of x86 address modes.

-- Document more of the hidden assumptions which insn selection relies
   on, particular wrt addressing modes.
parent e0e07f52
......@@ -19,17 +19,20 @@ import PprMach
import AbsCStixGen ( genCodeAbstractC )
import AbsCSyn ( AbstractC, MagicId )
import AsmRegAlloc ( runRegAllocate )
import OrdList ( OrdList, flattenOrdList )
import PrimOp ( commutableOp, PrimOp(..) )
import RegAllocInfo ( mkMRegsState, MRegsState, findReservedRegs )
import Stix ( StixTree(..), StixReg(..),
pprStixTrees, CodeSegment(..) )
pprStixTrees, ppStixTree, CodeSegment(..),
stixCountTempUses, stixSubst,
NatM, initNat, mapNat,
NatM_State, mkNatM_State,
uniqOfNatM_State, deltaOfNatM_State )
import PrimRep ( isFloatingRep, PrimRep(..) )
import UniqSupply ( returnUs, thenUs, mapUs, initUs,
initUs_, UniqSM, UniqSupply )
import UniqFM ( UniqFM, emptyUFM, addToUFM, lookupUFM )
import MachMisc ( IF_ARCH_i386(i386_insert_ffrees,) )
import OrdList ( fromOL, concatOL )
import Outputable
\end{code}
......@@ -85,11 +88,11 @@ So, here we go:
nativeCodeGen :: AbstractC -> UniqSupply -> (SDoc, SDoc)
nativeCodeGen absC us
= let (stixRaw, us1) = initUs us (genCodeAbstractC absC)
stixOpt = map (map genericOpt) stixRaw
stixOpt = map genericOpt stixRaw
insns = initUs_ us1 (codeGen stixOpt)
debug_stix = vcat (map pprStixTrees stixOpt)
in
trace "--------- native code generator ---------"
trace "nativeGen: begin"
(debug_stix, insns)
\end{code}
......@@ -108,25 +111,49 @@ codeGen stixFinal
docs = map (vcat . map pprInstr) static_instrss
-- for debugging only
docs_prealloc = map (vcat . map pprInstr . flattenOrdList)
docs_prealloc = map (vcat . map pprInstr . fromOL)
dynamic_codes
text_prealloc = vcat (intersperse (char ' ' $$ char ' ') docs_prealloc)
in
-- trace (showSDoc text_prealloc) (
returnUs (vcat (intersperse (char ' '
$$ text "# ___stg_split_marker"
$$ ptext SLIT("# ___stg_split_marker")
$$ char ' ')
docs))
-- )
\end{code}
Top level code generator for a chunk of stix code:
\begin{code}
genMachCode :: [StixTree] -> UniqSM InstrList
Top level code generator for a chunk of stix code. For this part of
the computation, we switch from the UniqSM monad to the NatM monad.
The latter carries not only a Unique, but also an Int denoting the
current C stack pointer offset in the generated code; this is needed
for creating correct spill offsets on architectures which don't offer,
or for which it would be prohibitively expensive to employ, a frame
pointer register. Viz, x86.
The offset is measured in bytes, and indicates the difference between
the current (simulated) C stack-ptr and the value it was at the
beginning of the block. For stacks which grow down, this value should
be either zero or negative.
genMachCode stmts
= mapUs stmt2Instrs stmts `thenUs` \ blocks ->
returnUs (foldr (.) id blocks asmVoid)
Switching between the two monads whilst carrying along the same Unique
supply breaks abstraction. Is that bad?
\begin{code}
genMachCode :: [StixTree] -> UniqSM InstrBlock
genMachCode stmts initial_us
= let initial_st = mkNatM_State initial_us 0
(blocks, final_st) = initNat initial_st
(mapNat stmt2Instrs stmts)
instr_list = concatOL blocks
final_us = uniqOfNatM_State final_st
final_delta = deltaOfNatM_State final_st
in
if final_delta == 0
then (instr_list, final_us)
else pprPanic "genMachCode: nonzero final delta"
(int final_delta)
\end{code}
The next bit does the code scheduling. The scheduler must also deal
......@@ -135,7 +162,7 @@ exposed via the OrdList, but more might occur, so further analysis
might be needed.
\begin{code}
scheduleMachCode :: [InstrList] -> [[Instr]]
scheduleMachCode :: [InstrBlock] -> [[Instr]]
scheduleMachCode
= map (runRegAllocate freeRegsState findReservedRegs)
......@@ -160,71 +187,95 @@ have introduced some new opportunities for constant-folding wrt
address manipulations.
\begin{code}
genericOpt :: StixTree -> StixTree
genericOpt :: [StixTree] -> [StixTree]
genericOpt = map stixConFold . stixPeep
stixPeep :: [StixTree] -> [StixTree]
-- This transformation assumes that the temp assigned to in t1
-- is not assigned to in t2; for otherwise the target of the
-- second assignment would be substituted for, giving nonsense
-- code. As far as I can see, StixTemps are only ever assigned
-- to once. It would be nice to be sure!
stixPeep ( t1@(StAssign pka (StReg (StixTemp u pk)) rhs)
: t2
: ts )
| stixCountTempUses u t2 == 1
&& sum (map (stixCountTempUses u) ts) == 0
= trace ("nativeGen: stixInline: " ++ showSDoc (ppStixTree rhs))
(stixPeep (stixSubst u rhs t2 : ts))
stixPeep (t1:t2:ts) = t1 : stixPeep (t2:ts)
stixPeep [t1] = [t1]
stixPeep [] = []
\end{code}
For most nodes, just optimize the children.
\begin{code}
genericOpt (StInd pk addr) = StInd pk (genericOpt addr)
stixConFold :: StixTree -> StixTree
genericOpt (StAssign pk dst src)
= StAssign pk (genericOpt dst) (genericOpt src)
stixConFold (StInd pk addr) = StInd pk (stixConFold addr)
genericOpt (StJump addr) = StJump (genericOpt addr)
stixConFold (StAssign pk dst src)
= StAssign pk (stixConFold dst) (stixConFold src)
genericOpt (StCondJump addr test)
= StCondJump addr (genericOpt test)
stixConFold (StJump addr) = StJump (stixConFold addr)
genericOpt (StCall fn cconv pk args)
= StCall fn cconv pk (map genericOpt args)
stixConFold (StCondJump addr test)
= StCondJump addr (stixConFold test)
stixConFold (StCall fn cconv pk args)
= StCall fn cconv pk (map stixConFold args)
\end{code}
Fold indices together when the types match:
\begin{code}
genericOpt (StIndex pk (StIndex pk' base off) off')
stixConFold (StIndex pk (StIndex pk' base off) off')
| pk == pk'
= StIndex pk (genericOpt base)
(genericOpt (StPrim IntAddOp [off, off']))
= StIndex pk (stixConFold base)
(stixConFold (StPrim IntAddOp [off, off']))
genericOpt (StIndex pk base off)
= StIndex pk (genericOpt base) (genericOpt off)
stixConFold (StIndex pk base off)
= StIndex pk (stixConFold base) (stixConFold off)
\end{code}
For PrimOps, we first optimize the children, and then we try our hand
at some constant-folding.
\begin{code}
genericOpt (StPrim op args) = primOpt op (map genericOpt args)
stixConFold (StPrim op args) = stixPrimFold op (map stixConFold args)
\end{code}
Replace register leaves with appropriate StixTrees for the given
target.
\begin{code}
genericOpt leaf@(StReg (StixMagicId id))
stixConFold leaf@(StReg (StixMagicId id))
= case (stgReg id) of
Always tree -> genericOpt tree
Always tree -> stixConFold tree
Save _ -> leaf
genericOpt other = other
stixConFold other = other
\end{code}
Now, try to constant-fold the PrimOps. The arguments have already
been optimized and folded.
\begin{code}
primOpt
stixPrimFold
:: PrimOp -- The operation from an StPrim
-> [StixTree] -- The optimized arguments
-> StixTree
primOpt op arg@[StInt x]
stixPrimFold op arg@[StInt x]
= case op of
IntNegOp -> StInt (-x)
_ -> StPrim op arg
primOpt op args@[StInt x, StInt y]
stixPrimFold op args@[StInt x, StInt y]
= case op of
CharGtOp -> StInt (if x > y then 1 else 0)
CharGeOp -> StInt (if x >= y then 1 else 0)
......@@ -253,13 +304,13 @@ also assume that constants have been shifted to the right when
possible.
\begin{code}
primOpt op [x@(StInt _), y] | commutableOp op = primOpt op [y, x]
stixPrimFold op [x@(StInt _), y] | commutableOp op = stixPrimFold op [y, x]
\end{code}
We can often do something with constants of 0 and 1 ...
\begin{code}
primOpt op args@[x, y@(StInt 0)]
stixPrimFold op args@[x, y@(StInt 0)]
= case op of
IntAddOp -> x
IntSubOp -> x
......@@ -272,9 +323,15 @@ primOpt op args@[x, y@(StInt 0)]
ISllOp -> x
ISraOp -> x
ISrlOp -> x
IntNeOp | is_comparison -> x
_ -> StPrim op args
where
is_comparison
= case x of
StPrim opp [_, _] -> opp `elem` comparison_ops
_ -> False
primOpt op args@[x, y@(StInt 1)]
stixPrimFold op args@[x, y@(StInt 1)]
= case op of
IntMulOp -> x
IntQuotOp -> x
......@@ -285,7 +342,7 @@ primOpt op args@[x, y@(StInt 1)]
Now look for multiplication/division by powers of 2 (integers).
\begin{code}
primOpt op args@[x, y@(StInt n)]
stixPrimFold op args@[x, y@(StInt n)]
= case op of
IntMulOp -> case exactLog2 n of
Nothing -> StPrim op args
......@@ -299,5 +356,16 @@ primOpt op args@[x, y@(StInt n)]
Anything else is just too hard.
\begin{code}
primOpt op args = StPrim op args
stixPrimFold op args = StPrim op args
\end{code}
\begin{code}
comparison_ops
= [ CharGtOp , CharGeOp , CharEqOp , CharNeOp , CharLtOp , CharLeOp,
IntGtOp , IntGeOp , IntEqOp , IntNeOp , IntLtOp , IntLeOp,
WordGtOp , WordGeOp , WordEqOp , WordNeOp , WordLtOp , WordLeOp,
AddrGtOp , AddrGeOp , AddrEqOp , AddrNeOp , AddrLtOp , AddrLeOp,
FloatGtOp , FloatGeOp , FloatEqOp , FloatNeOp , FloatLtOp , FloatLeOp,
DoubleGtOp, DoubleGeOp, DoubleEqOp, DoubleNeOp, DoubleLtOp, DoubleLeOp
]
\end{code}
\ No newline at end of file
......@@ -8,20 +8,20 @@ module AsmRegAlloc ( runRegAllocate, runHairyRegAllocate ) where
#include "HsVersions.h"
import MachCode ( InstrList )
import MachMisc ( Instr )
import MachCode ( InstrBlock )
import MachMisc ( Instr(..) )
import PprMach ( pprUserReg ) -- debugging
import MachRegs
import RegAllocInfo
import FiniteMap ( emptyFM, addListToFM, delListFromFM, lookupFM, keysFM )
import FiniteMap ( emptyFM, addListToFM, delListFromFM,
lookupFM, keysFM )
import Maybes ( maybeToBool )
import OrdList ( mkEmptyList, mkUnitList, mkSeqList, mkParList,
flattenOrdList, OrdList
)
import Unique ( mkBuiltinUnique )
import Util ( mapAccumB )
import OrdList ( unitOL, appOL, fromOL, concatOL )
import Outputable
import List ( mapAccumL )
\end{code}
This is the generic register allocator.
......@@ -33,7 +33,7 @@ things the hard way.
runRegAllocate
:: MRegsState
-> ([Instr] -> [[RegNo]])
-> InstrList
-> InstrBlock
-> [Instr]
runRegAllocate regs find_reserve_regs instrs
......@@ -49,21 +49,21 @@ runRegAllocate regs find_reserve_regs instrs
Nothing -> tryHairy resvs
reserves = find_reserve_regs flatInstrs
flatInstrs = flattenOrdList instrs
simpleAlloc = simpleRegAlloc regs [] emptyFM flatInstrs
flatInstrs = fromOL instrs
simpleAlloc = simpleRegAlloc regs [] emptyFM flatInstrs
hairyAlloc resvd = hairyRegAlloc regs resvd flatInstrs
runHairyRegAllocate
:: MRegsState
-> [RegNo]
-> InstrList
-> InstrBlock
-> Maybe [Instr]
runHairyRegAllocate regs reserve_regs instrs
= hairyRegAlloc regs reserve_regs flatInstrs
where
flatInstrs = flattenOrdList instrs
flatInstrs = fromOL instrs
\end{code}
Here is the simple register allocator. Just dole out registers until
......@@ -157,8 +157,7 @@ hairyRegAlloc regs reserve_regs instrs =
| null reserve_regs -> Nothing
-- failed, but we have reserves, so attempt to do spilling
| otherwise
-> let instrs_patched' = patchMem instrs'
instrs_patched = flattenOrdList instrs_patched'
-> let instrs_patched = patchMem instrs'
in
case mapAccumB (doRegAlloc []) (RH regs'' mloc1 emptyFM)
noFuture instrs_patched of
......@@ -185,30 +184,47 @@ hairyRegAlloc regs reserve_regs instrs =
toMappedReg (I# i) = MappedReg i
\end{code}
Here we patch instructions that reference ``registers'' which are really in
memory somewhere (the mapping is under the control of the machine-specific
code generator). We place the appropriate load sequences before any instructions
that use memory registers as sources, and we place the appropriate spill sequences
after any instructions that use memory registers as destinations. The offending
instructions are rewritten with new dynamic registers, so we have to run register
allocation again after all of this is said and done.
Here we patch instructions that reference ``registers'' which are
really in memory somewhere (the mapping is under the control of the
machine-specific code generator). We place the appropriate load
sequences before any instructions that use memory registers as
sources, and we place the appropriate spill sequences after any
instructions that use memory registers as destinations. The offending
instructions are rewritten with new dynamic registers, so we have to
run register allocation again after all of this is said and done.
On some architectures (x86, currently), we do without a frame-pointer,
and instead spill relative to the stack pointer (%esp on x86).
Because the stack pointer may move, the patcher needs to keep track of
the current stack pointer "delta". That's easy, because all it needs
to do is spot the DELTA bogus-insns which will have been inserted by
the relevant insn selector precisely so as to notify the spiller of
stack-pointer movement. The delta is passed to loadReg and spillReg,
since they generate the actual spill code. We expect the final delta
to be the same as the starting one (zero), reflecting the fact that
changes to the stack pointer should not extend beyond a basic block.
\begin{code}
patchMem :: [Instr] -> InstrList
patchMem :: [Instr] -> [Instr]
patchMem cs
= let (final_stack_delta, css) = mapAccumL patchMem' 0 cs
in
if final_stack_delta == 0
then concat css
else pprPanic "patchMem: non-zero final delta"
(int final_stack_delta)
patchMem cs = foldr (mkSeqList . patchMem') mkEmptyList cs
patchMem' :: Int -> Instr -> (Int, [Instr])
patchMem' delta instr
patchMem' :: Instr -> InstrList
| null memSrcs && null memDsts
= (delta', [instr])
patchMem' instr
| null memSrcs && null memDsts = mkUnitList instr
| otherwise =
mkSeqList
(foldr mkParList mkEmptyList loadSrcs)
(mkSeqList instr'
(foldr mkParList mkEmptyList spillDsts))
| otherwise
= (delta', loadSrcs ++ [instr'] ++ spillDsts)
where
delta' = case instr of DELTA d -> d ; _ -> delta
where
(RU srcs dsts) = regUsage instr
memToDyn (MemoryReg i pk) = UnmappedReg (mkBuiltinUnique i) pk
......@@ -217,13 +233,13 @@ patchMem' instr
memSrcs = [ r | r@(MemoryReg _ _) <- regSetToList srcs]
memDsts = [ r | r@(MemoryReg _ _) <- regSetToList dsts]
loadSrcs = map load memSrcs
loadSrcs = map load memSrcs
spillDsts = map spill memDsts
load mem = loadReg mem (memToDyn mem)
spill mem = spillReg (memToDyn mem) mem
load mem = loadReg delta mem (memToDyn mem)
spill mem = spillReg delta' (memToDyn mem) mem
instr' = mkUnitList (patchRegs instr memToDyn)
instr' = patchRegs instr memToDyn
\end{code}
\begin{code}
......
This diff is collapsed.
......@@ -301,6 +301,7 @@ data Size
| L
| F -- IEEE single-precision floating pt
| DF -- IEEE single-precision floating pt
| F80 -- Intel 80-bit internal FP format; only used for spilling
#endif
#if sparc_TARGET_ARCH
= B -- byte (signed)
......@@ -351,6 +352,8 @@ data Instr
String -- the literal string
| DATA Size
[Imm]
| DELTA Int -- specify current stack offset for
-- benefit of subsequent passes
\end{code}
\begin{code}
......@@ -470,6 +473,10 @@ contents, would not impose a fixed mapping from %fake to %st regs, and
hopefully could avoid most of the redundant reg-reg moves of the
current translation.
We might as well make use of whatever unique FP facilities Intel have
chosen to bless us with (let's not be churlish, after all).
Hence GLDZ and GLD1. Bwahahahahahahaha!
\begin{code}
#if i386_TARGET_ARCH
......@@ -509,10 +516,10 @@ current translation.
| BT Size Imm Operand
| NOP
-- Float Arithmetic. -- ToDo for 386
-- Float Arithmetic.
-- Note that we cheat by treating G{ABS,MOV,NEG} of doubles as single instructions
-- right up until we spit them out.
-- Note that we cheat by treating G{ABS,MOV,NEG} of doubles
-- as single instructions right up until we spit them out.
-- all the 3-operand fake fp insns are src1 src2 dst
-- and furthermore are constrained to be fp regs only.
......@@ -521,6 +528,9 @@ current translation.
| GLD Size MachRegsAddr Reg -- src, dst(fpreg)
| GST Size Reg MachRegsAddr -- src(fpreg), dst
| GLDZ Reg -- dst(fpreg)
| GLD1 Reg -- dst(fpreg)
| GFTOD Reg Reg -- src(fpreg), dst(fpreg)
| GFTOI Reg Reg -- src(fpreg), dst(intreg)
......@@ -595,6 +605,7 @@ is_G_instr :: Instr -> Bool
is_G_instr instr
= case instr of
GMOV _ _ -> True; GLD _ _ _ -> True; GST _ _ _ -> True;
GLDZ _ -> True; GLD1 _ -> True;
GFTOD _ _ -> True; GFTOI _ _ -> True;
GDTOF _ _ -> True; GDTOI _ _ -> True;
GITOF _ _ -> True; GITOD _ _ -> True;
......
......@@ -64,11 +64,12 @@ import AbsCUtils ( magicIdPrimRep )
import CLabel ( CLabel )
import PrimOp ( PrimOp(..) )
import PrimRep ( PrimRep(..) )
import Stix ( sStLitLbl, StixTree(..), StixReg(..) )
import Stix ( sStLitLbl, StixTree(..), StixReg(..),
getUniqueNat, returnNat, thenNat, NatM )
import Unique ( mkPseudoUnique1, mkPseudoUnique2, mkPseudoUnique3,
Uniquable(..), Unique
)
import UniqSupply ( getUniqueUs, returnUs, thenUs, UniqSM )
--import UniqSupply ( getUniqueUs, returnUs, thenUs, UniqSM )
import Outputable
\end{code}
......@@ -270,10 +271,10 @@ data Reg
mkReg :: Unique -> PrimRep -> Reg
mkReg = UnmappedReg
getNewRegNCG :: PrimRep -> UniqSM Reg
getNewRegNCG :: PrimRep -> NatM Reg
getNewRegNCG pk
= getUniqueUs `thenUs` \ u ->
returnUs (UnmappedReg u pk)
= getUniqueNat `thenNat` \ u ->
returnNat (UnmappedReg u pk)
instance Text Reg where
showsPrec _ (FixedReg i) = showString "%" . shows IBOX(i)
......
Known bugs/issues in nativeGen, 000202 (JRS)
Known bugs/issues in nativeGen, 000228 (JRS)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All these bugs are for x86; I don't know about sparc/alpha.
-- absC -> stix translation for GET_TAG and in fact anything
to do with the packed-halfword layout info itbl field is
pretty dubious. I think I have it fixed for big and little
endian 32-bit, but it won't work at all on a 64 bit platform.
-- Most of the x86 insn selector code in MachCode.lhs needs to
be checked against the Rules of the Game recorded in that file.
I think there are a lot of subtle violations.
-- When selecting spill regs, don't use %eax if there is a CALL insn
(perhaps excluding calls to newCAF, since it doesn't return a
result).
-- Keep track of the stack offset so that correct spill code can
be generated even if %esp moves. At the moment %esp doesn't
move, so the problem doesn't exist, but there is a different
problem: ccalls put args in memory below %esp and only move
%esp immediately prior to the call. This is dangerous because
(1) writing below %esp can cause a segmentation fault (as deemed
by the OS), and (2) if a signal should be handled on that stack
during argument construction, the args will get silently trashed.
Currently, implementation of GITOF et al use the stack, so are
incompatible with current ccall implementation. When the latter
is fixed, GITOF et al should present no problem. Same issue
applies to GCOS, GSIN, GTAN, GSQRT if they have to truncate their
result to 32-bit float.
-- nofib/real/hidden gets slightly different FP answers from the
via-C route; possibly due to exp/log not being done in-line.
-- There may or may not be bugs in some of the x86 insn selector
code in MachCode.lhs. I have checked all of it against the
Rules of the Game (+ Rules of the game for Amodes) recorded in
that file, but am not 100% convinced that it is all correct.
I think most of it is, tho.
-- Possibly implement GLDZ and GLD1 as analogues of FLDZ and FLD1
(x86), to reduce number of constants emitted in f-p code.
-- It won't compile on Solaris or Alphas because the insn selectors
are not up-to-date.
-- NCG introduces a massive space leak; I think it generates all the
assembly code before printing any of it out (a depressingly
familiar story ...). Fixing this will await a working heap profiler.
......@@ -175,12 +175,13 @@ pprSize x = ptext (case x of
TF -> SLIT("t")
#endif
#if i386_TARGET_ARCH
B -> SLIT("b")
-- HB -> SLIT("b") UNUSED
-- S -> SLIT("w") UNUSED
L -> SLIT("l")
F -> SLIT("s")
DF -> SLIT("l")
B -> SLIT("b")
-- HB -> SLIT("b") UNUSED
-- S -> SLIT("w") UNUSED
L -> SLIT("l")
F -> SLIT("s")
DF -> SLIT("l")
F80 -> SLIT("t")
#endif
#if sparc_TARGET_ARCH
B -> SLIT("sb")
......@@ -299,27 +300,27 @@ pprAddr (AddrRegImm r1 i)
#if i386_TARGET_ARCH
pprAddr (ImmAddr imm off)
= let
pp_imm = pprImm imm
= let pp_imm = pprImm imm
in
if (off == 0) then
pp_imm
else if (off < 0) then
(<>) pp_imm (int off)
pp_imm <> int off
else
hcat [pp_imm, char '+', int off]
pp_imm <> char '+' <> int off
pprAddr (AddrBaseIndex base index displacement)
= let
pp_disp = ppr_disp displacement
pp_off p = (<>) pp_disp (parens p)
pp_off p = pp_disp <> char '(' <> p <> char ')'
pp_reg r = pprReg L r
in
case (base,index) of
(Nothing, Nothing) -> pp_disp
(Just b, Nothing) -> pp_off (pp_reg b)
(Nothing, Just (r,i)) -> pp_off (hcat [pp_reg r, comma, int i])
(Just b, Just (r,i)) -> pp_off (hcat [pp_reg b, comma, pp_reg r, comma, int i])
(Nothing, Just (r,i)) -> pp_off (pp_reg r <> comma <> int i)
(Just b, Just (r,i)) -> pp_off (pp_reg b <> comma <> pp_reg r
<> comma <> int i)
where
ppr_disp (ImmInt 0) = empty
ppr_disp imm = pprImm imm
......@@ -368,6 +369,9 @@ pprInstr (COMMENT s)
,IF_ARCH_i386( ((<>) (ptext SLIT("# ")) (ptext s))
,)))
pprInstr (DELTA d)
= pprInstr (COMMENT (_PK_ ("\tdelta = " ++ show d)))
pprInstr (SEGMENT TextSegment)
= IF_ARCH_alpha(ptext SLIT("\t.text\n\t.align 3") {-word boundary-}
,IF_ARCH_sparc(ptext SLIT("\t.text\n\t.align 4") {-word boundary-}
......@@ -992,6 +996,11 @@ pprInstr g@(GST sz src addr)
= pprG g (hcat [gtab, gpush src 0, gsemi,
text "fstp", pprSize sz, gsp, pprAddr addr])
pprInstr g@(GLDZ dst)
= pprG g (hcat [gtab, text "ffree %st(7) ; fldz ; ", gpop dst 1])
pprInstr g@(GLD1 dst)
= pprG g (hcat [gtab, text "ffree %st(7) ; fld1 ; ", gpop dst 1])
pprInstr g@(GFTOD src dst)
= pprG g bogus
pprInstr g@(GFTOI src dst)
......@@ -1085,6 +1094,9 @@ pprGInstr (GMOV src dst) = pprSizeRegReg SLIT("gmov") DF src dst
pprGInstr (GLD sz src dst) = pprSizeAddrReg SLIT("gld") sz src dst
pprGInstr (GST sz src dst) = pprSizeRegAddr SLIT("gst") sz src dst
pprGInstr (GLDZ dst) = pprSizeReg SLIT("gldz") DF dst
pprGInstr (GLD1 dst) = pprSizeReg SLIT("gld1") DF dst
pprGInstr (GFTOD src dst) = pprSizeSizeRegReg SLIT("gftod") F DF src dst
pprGInstr (GFTOI src dst) = pprSizeSizeRegReg SLIT("gftoi") F L src dst
......@@ -1112,11 +1124,11 @@ Continue with I386-only printing bits and bobs:
\begin{code}
pprDollImm :: Imm -> SDoc
pprDollImm i = hcat [ ptext SLIT("$"), pprImm i]
pprDollImm i = ptext SLIT("$") <> pprImm i
pprOperand :: Size -> Operand -> SDoc
pprOperand s (OpReg r) = pprReg s r
pprOperand s (OpImm i) = pprDollImm i
pprOperand s (OpReg r) = pprReg s r