nofib/real/gg is miscompiled at -O1
I've noticed that nofib/real/gg
fails (output mismatch) and after reducing
the problem I've got the following small example (the trace expressions can also
be removed):
module Main where
import Debug.Trace
main = do
putStrLn $ printFloat 100
printFloat :: Float -> String
printFloat x = f (show (round (x*10)))
where
f "0" = trace "case 0" "0"
f _ = trace "case _" $ show (round x)
}}}
compiling it with current HEAD:
{{{
> ~/dev/ghc-clean/inplace/bin/ghc-stage2 -fforce-recomp -O0 Test.hs
[1 of 1] Compiling Main ( Test.hs, Test.o )
Linking Test ...
> ./Test
case _
100
> ~/dev/ghc-clean/inplace/bin/ghc-stage2 -fforce-recomp -O1 Test.hs
[1 of 1] Compiling Main ( Test.hs, Test.o )
Linking Test ...
> ./Test
case _
1000
> ~/dev/ghc-clean/inplace/bin/ghc-stage2 -fforce-recomp -O2 Test.hs
[1 of 1] Compiling Main ( Test.hs, Test.o )
Linking Test ...
> ./Test
case _
100
}}}
Note that with ```-O1``` the output is 1000! It seems that the bug is either in
the old codegen or the new one does not trigger it:
{{{
> ~/dev/ghc-clean/inplace/bin/ghc-stage2 -fforce-recomp -O1 -fnew-codegen Test.hs
[1 of 1] Compiling Main ( Test.hs, Test.o )
Linking Test ...
> ./Test
case _
100
I've also looked at assembly and the only thing that I've noticed is that two
instructions are in different order when compiling with O1
:
Main.$wprintFloat_info:
_c28u:
leaq -40(%rbp),%rax
cmpq %r15,%rax
jb _c2aV
mulss _n2aX(%rip),%xmm1 <--- !
movss %xmm1,%xmm0 <--- !
subq $8,%rsp
movl $1,%eax
call rintFloat
addq $8,%rsp
movss %xmm1,-8(%rbp)
movss %xmm0,%xmm1
movq $s23Y_info,-16(%rbp)
addq $-16,%rbp
jmp stg_decodeFloat_Int#
_c2aV:
movl $Main.$wprintFloat_closure,%ebx
jmp *-8(%r13)
.size Main.$wprintFloat_info, .-Main.$wprintFloat_info
.section .rodata
.align 8
.align 4
}}}
and ```O2```:
{{{
Main.$wprintFloat_info:
_c28R:
leaq -40(%rbp),%rax
cmpq %r15,%rax
jb _c2aX
movss %xmm1,%xmm0 <--- !
mulss _n2aZ(%rip),%xmm0 <--- !
subq $8,%rsp
movl $1,%eax
call rintFloat
addq $8,%rsp
movss %xmm1,-8(%rbp)
movss %xmm0,%xmm1
movq $s24l_info,-16(%rbp)
addq $-16,%rbp
jmp stg_decodeFloat_Int#
_c2aX:
movl $Main.$wprintFloat_closure,%ebx
jmp *-8(%r13)
.size Main.$wprintFloat_info, .-Main.$wprintFloat_info
.section .rodata
.align 8
.align 4
}}}
If I read this right, in ```O1``` case the ```xmm1``` register will contain 1000
(1000 * 10) and this value will be stored on the stack, whereas the ```O2```
version first moves the value from ```xmm1``` to ```xmm0``` and only then
multiplies it (and also stores ```xmm1``` on the stack but this time it should
be equal to 100). So if the value stored on the stack is subsequently used, it
would explain the difference between the two programs.
For the record this is on x86_64 Linux and the GHC version used is:
{{{
> ~/dev/ghc-clean/inplace/bin/ghc-stage2 --version
The Glorious Glasgow Haskell Compilation System, version 7.7.20120816
Everything works as expected on GHC-7.4.2.
Trac metadata
Trac field | Value |
---|---|
Version | 7.7 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |