foreign functions called from optimized Haskell code receive non-empty fpu stack
It has been observed that wrong NaN's are produced by foreign functions in complex and not easily reproducible situations:
I also had this problem in the hmatrix library. But now it can be easily reproduced and traced to a particular point in foreign C code, where we observe that the product of two innocent looking doubles evaluates to NaN. Using gdb we find that the fpu state received by the foreign function, which is normally like this:
(gdb) info float R7: Empty 0x3ffcaaf881cab3b3b800 R6: Empty 0x3fff8000000000000000 R5: Empty 0x00000000000000000000 R4: Empty 0x3fff8000000000000000 R3: Empty 0xbffb88bf61ae9d22828a R2: Empty 0x80000000000000000000 R1: Empty 0x3ffe86490c2ce3303000 =>R0: Empty 0xbffcfb4ceea7f26db95c Status Word: 0x4032 DE UE PE C3 TOP: 0 Control Word: 0x037f IM DM ZM OM UM PM PC: Extended Precision (64-bits) RC: Round to nearest Tag Word: 0xffff Instruction Pointer: 0x73:0xb7e284b0 Operand Pointer: 0x7b:0xb6e7e1f8 Opcode: 0xdd1a
has some "valid" data in the calls which produce the NaN:
(gdb) info float R7: Empty 0x3fcb9e8b62d59977a000 R6: Empty 0x3fcabc5cf9a9be1cd800 R5: Empty 0x401f8000000000000000 R4: Empty 0x401e9b97f4a800000000 R3: Empty 0xbffcaaf881cab3b3b800 R2: Empty 0x00000000000000000000 R1: Valid 0x3fdddbe6fecebdedd800 +1.000000000000000036e-10 =>R0: Valid 0x3fcb9e8b62d59977a000 +2.750308259657520213e-16 Status Word: 0x0132 DE UE PE C0 TOP: 0 Control Word: 0x037f IM DM ZM OM UM PM PC: Extended Precision (64-bits) RC: Round to nearest Tag Word: 0xfff0 Instruction Pointer: 0x73:0x08063cb3 Operand Pointer: 0x7b:0x00000000 Opcode: 0xd8da
The attached tarball showbug.tar.gz contains a modified version of the library which exposes the problem. Detailed information can be found in the enclosed file README_BUG.
A simple workaround is calling asm("finit") before any work is done in the foreign functions. However, there is strong evidence that the problem is caused in the Haskell side because asm("finit") does not avoid the problem if called after each foreign function. See README_BUG for details. In any case, I also attach a version (showbug2) without lapack (please try the other one first).
It only happens with optimized -fasm Haskell code, and with certain foreign functions. Interestingly, some implementations (e.g. Intel's MKL) don't produce any NaN, so at first it could be thought that there was some subtle bug in some versions of blas/lapack. But this new result about the fpu state suggests that this is probably not the case.
The wrong NaN's appear in all the 32bit machines that I have tested (older Pentium, newer Dual Core, and even on an ultraportable with Atom, and with different Ubuntu versions, Fedora, Arch, and Debian). Probably it also happens in Windows, but I have not tested it recently. I experienced the problem with ghc 6.8.x, 6.10-rc, (and probably 6.6). I cannot reproduce the problem on 64bit systems (but I have only tested one single 64bit machine)
Finally, I hope that this problem is not caused by some error in the hmatrix library...