the 'impossible' happened : expectJust block_order

changed weight to 5

Also here on x86_64 Mac OS X.

Trac metadata

Trac field	Value
Operating system	Linux → Unknown/Multiple
Architecture	powerpc64 → Unknown/Multiple

Hmm, compiles fine on x86_64 Linux and powerpc Linux.

This looks bad. How can we reproduce it? Is it architecture specific? What is your build setup erikd? Thanks

Simon

Austin sugests that this happened around the time when I pushed my loopification patch (this bug reported on 30 Aug, my patch pushed a day earlier). Panic happens in splitAtProcPoints function and I recall that my previous attempt at loopification as a Cmm pass didn't work with LLVM because it broke the invariant that a block may be reachable only from a single procpoint.

So, can anyone experiencing this problem try

git revert d61c3ac186c94021c851f7a2a6d20631e35fc1ba

and see if this solves the problem?

erikd, can you upload your build.mk file?

Attached file build.mk (download).

build.mk file from my powerpc64 build tree as requested by @jstolarek.

@jstolarek : On powerpc64-linux, if I revert commit d61c3ac1 the stage1 build completes and then fails during the stage2 build. This suggests that something in that commit is causing the expectJust failure.

Thanks for your build.mk. Unfortunately I can't reproduce this on my Linux machine - it seems that the problem only happens on Macs.

if I revert commit d61c3ac1 the stage1 build completes and then fails during the stage2 build.

This is most strange. If stage2 fails after reverting that commit this would mean that you are experiencing some other bug. Did you clean the build tree after reverting the commit? Also, how does stage2 fail? What error do you get?

One thing you could do to help us in debugging this is trying to build HEAD and when you get a build failure, you could re-run the last command with -ddump-cmm -dcmm-lint added to the command line. So this would be something like this:

"inplace/bin/ghc-stage1" -hisuf dyn_hi -osuf  dyn_o -hcsuf dyn_hc -fPIC -dynamic 
  -H32m -O -Werror -Wall -H64m -O0    -hide-all-packages -i -iutils/hpc/. 
  -iutils/hpc/dist-install/build -iutils/hpc/dist-install/build/autogen
  -Iutils/hpc/dist-install/build -Iutils/hpc/dist-install/build/autogen
  -optP-include -optPutils/hpc/dist-install/build/autogen/cabal_macros.h
  -package array-0.4.0.2 -package base-4.7.0.0 -package containers-0.5.2.1
  -package directory-1.2.0.1 -package hpc-0.6.0.1 -XHaskell98 -XCPP
  -no-user-package-db -rtsopts -fwarn-tabs     -odir utils/hpc/dist-install/build
  -hidir utils/hpc/dist-install/build -stubdir utils/hpc/dist-install/build
  -c utils/hpc/dist-install/build/HpcParser.hs
  -o utils/hpc/dist-install/build/HpcParser.dyn_o -ddump-cmm -dcmm-lint

If you could upload output from this command this would tell us why this is happening.

Replying to [ticket:8205#comment:74115 jstolarek]:

if I revert commit d61c3ac1 the stage1 build completes and then fails during the stage2 build.

This is most strange. If stage2 fails after reverting that commit this would mean that you are experiencing some other bug.

Yes, this is another probably unrelated bug.

Did you clean the build tree after reverting the commit?

Yes.

Also, how does stage2 fail? What error do you get?

"inplace/bin/ghc-stage2" -optc-Werror -optc-Wall -optc-Ilibraries/old-time/include
  -optc-I'/home/erikd/PPC64/ghc-ppc64/libraries/base/include'
  -optc-I'/home/erikd/PPC64/ghc-ppc64/rts/dist/build'
  -optc-I'/home/erikd/PPC64/ghc-ppc64/includes'
  -optc-I'/home/erikd/PPC64/ghc-ppc64/includes/dist-derivedconstants/header'
  -optc-Werror=unused-but-set-variable -optc-Wno-error=inline -fPIC -dynamic  -H32m
  -O -Werror -Wall -H64m -O0    -package-name old-time-1.1.0.1 -hide-all-packages -i
  -ilibraries/old-time/. -ilibraries/old-time/dist-install/build
  -ilibraries/old-time/dist-install/build/autogen -Ilibraries/old-time/dist-install/build
  -Ilibraries/old-time/dist-install/build/autogen -Ilibraries/old-time/include   
  -optP-include -optPlibraries/old-time/dist-install/build/autogen/cabal_macros.h
  -package base-4.7.0.0 -package old-locale-1.0.0.5 -XHaskell98 -XCPP
  -XForeignFunctionInterface -O2 -O -dcore-lint -fno-warn-deprecated-flags 
  -no-user-package-db -rtsopts      -c libraries/old-time/cbits/timeUtils.c
  -o libraries/old-time/dist-install/build/cbits/timeUtils.dyn_o
Segmentation fault
make[1]: *** [libraries/old-time/dist-install/build/cbits/timeUtils.dyn_o] Error 139

THis is actually the first command run using the second stage compiler. It builds the non dynamic timeUtils.o object successfully and when it builds timeUtils.dyn_o I get this segfault.

i tried dumping the cmm and lint files but it seems the compiler segfaulted before it got to that stage.

I'm going to try and build without dyn.

After disabling dyn it segfaults with:

"inplace/bin/ghc-stage2" -hisuf hi -osuf  o -hcsuf hc -static  -H32m -O -Werror -Wall -H64m -O0  
  -hide-all-packages -i -iutils/haddock/driver -iutils/haddock/src -iutils/haddock/dist/build 
  -iutils/haddock/dist/build/autogen -Iutils/haddock/dist/build -Iutils/haddock/dist/build/autogen    
  -optP-DIN_GHC_TREE -optP-include -optPutils/haddock/dist/build/autogen/cabal_macros.h
  -package Cabal-1.18.0 -package array-0.4.0.2 -package base-4.7.0.0 -package containers-0.5.3.1 
  -package deepseq-1.3.0.2 -package directory-1.2.0.1 -package filepath-1.3.0.2
  -package ghc-7.7.20130906 -package xhtml-3000.2.1 -funbox-strict-fields -Wall -fwarn-tabs
  -O2 -XHaskell2010  -no-user-package-db -rtsopts      -odir utils/haddock/dist/build
  -hidir utils/haddock/dist/build -stubdir utils/haddock/dist/build
  -c utils/haddock/src/Haddock/GhcUtils.hs -o utils/haddock/dist/build/Haddock/GhcUtils.o

which is the third object file to be built with the stage2 compiler.

OK, I'm really puzzled about this segfault in stage2 compiler, but perhaps I can help with panic in expectJust - we need to reproduce it. This means you need to revert the reverting commit :) or in other words attempt to build unmodified HEAD and allow stage1 build to fail with panic that you originally reported. After that happens run the command that causes the segfault with -ddump-cmm -dcmm-lint added.

Jan, have you managed to reproduce it on a Mac OS X machine?

Edward: No, unfortunately I don't have access to one. I asked Richard if he is able to reproduce the problem on his Mac but everything builds fine on his machine.

Attached file dump-out.txt.gz (download).

Gzipped output of failing compile command with -ddump-cmm -dcmm-lint

@jstolarek : I attached the -ddump-cmm -dcmm-lint output you asked for. Let me know if you need anything else.

assigned to @trac-jstolarek

erikd: Thanks. Are you sure this is the right dump? If compiler panicked during compilation the dump should be incomplete, whereas yours is.

But that's not that important - Kazu provided a dump which allows me to figure out what's going on. Below is an explanation of what is going on (no solution yet).

Here is how Cmm looks before stack layout (cFXJ and cFXS are important here):

==================== Post control-flow optimisations ====================
{offset
  cFXP:
      _sCxU::I32 = I32[(old + 12)];
      _sCxV::P32 = P32[(old + 8)];
      goto cFXI;
  cFXI:
      if (Sp - <highSp> < SpLim) goto cFXS; else goto cFXT;
  cFXT:
      _sCxW::I32 = _sCxU::I32;
      if (_sCxW::I32 != 0) goto cFXN; else goto cFXO;
  cFXN:
      I32[(young<cFXR> + 4)] = cFXR;
      R1 = _sCxV::P32;
      if (R1 & 3 != 0) goto cFXR; else goto cFXU;
  cFXU:
      call (I32[R1])(R1) returns to cFXR, args: 4, res: 4, upd: 4;
  cFXR:
      _sCxX::P32 = R1;
      _sCxY::P32 = P32[_sCxX::P32 + 3];
      _sCxZ::P32 = P32[_sCxX::P32 + 7];
      _cFXZ::I32 = _sCxW::I32 - 1;
      _sCy0::I32 = _cFXZ::I32;
      _sCxV::P32 = _sCxZ::P32;
      _sCxU::I32 = _sCy0::I32;
      goto cFXJ;
  cFXJ:
      if (Sp - <highSp> < SpLim) goto cFXS; else goto cFXT;
  cFXS:
      R1 = happyDropStk_rjgW_closure;
      I32[(old + 12)] = _sCxU::I32;
      P32[(old + 8)] = _sCxV::P32;
      call (stg_gc_fun)(R1) args: 12, res: 0, upd: 4;
  cFXO:
      R1 = _sCxV::P32 & (-4);
      call (I32[R1])(R1) args: 4, res: 0, upd: 4;
}

Stack layout transforms it to:

==================== Layout Stack ====================
{offset
  cFXP:
      _sCxU::I32 = I32[Sp];
      _sCxV::P32 = P32[Sp + 4];
      goto cFXI;
  cFXI:
      goto cFXT;
  cFXT:
      _sCxU::I32 = I32[Sp];
      _sCxV::P32 = P32[Sp + 4];
      _sCxW::I32 = _sCxU::I32;
      if (_sCxW::I32 != 0) goto cFXN; else goto cFXO;
  cFXN:
      I32[Sp] = cFXR;
      R1 = _sCxV::P32;
      I32[Sp + 4] = _sCxW::I32;
      if (R1 & 3 != 0) goto cFXR; else goto cFXU;
  cFXU:
      call (I32[R1])(R1) returns to cFXR, args: 4, res: 4, upd: 4;
  cFXR:
      _sCxW::I32 = I32[Sp + 4];
      _sCxX::P32 = R1;
      _sCxY::P32 = P32[_sCxX::P32 + 3];
      _sCxZ::P32 = P32[_sCxX::P32 + 7];
      _cFXZ::I32 = _sCxW::I32 - 1;
      _sCy0::I32 = _cFXZ::I32;
      _sCxV::P32 = _sCxZ::P32;
      _sCxU::I32 = _sCy0::I32;
      goto cFXJ;
  cFXJ:
      goto uFY0;
  uFY0:
      I32[Sp] = _sCxU::I32;
      P32[Sp + 4] = _sCxV::P32;
      goto cFXT;
  cFXO:
      R1 = _sCxV::P32 & (-4);
      Sp = Sp + 8;
      call (I32[R1])(R1) args: 4, res: 0, upd: 4;
}

Notice that cFXS block was eliminated during stack layout and we got a new uFY0 block. Now comes the time for CAF analysis followed by proc-point analysis:

==================== CAFEnv ====================
[(cFXI, {}), (cFXJ, {}), (cFXN, {}), (cFXO, {}), (cFXP, {}),
 (cFXR, {}), (cFXT, {}), (cFXU, {}), (uFY0, {})]

==================== procpoint map ====================
[(cFXI, reached by cFXP), (cFXJ, reached by cFXR),
 (cFXN, reached by cFXT), (cFXO, reached by cFXT), (cFXP, <procpt>),
 (cFXR, <procpt>), (cFXS, <procpt>), (cFXT, <procpt>),
 (cFXU, reached by cFXT), (uFY0, reached by cFXR)]

Notice that procpoint map refers to deleted cFXS block. The problem is that we determine proc-points before stack layout but run proc-point analysis after stack layout. Clearly, stack layout can remove some proc-points that we previously computed and thus invalidate our analysis. I don't have a good idea for a solution yet. We can't compute proc-points after stack layout, because stack-layout needs that information. One idea that comes to my mind is modifying stack layout so that it returns a new list of procpoints, possibly modified.

I wonder why does this only happen on MacOS and why only on some machines. I think this should be deterministic and happen always, regardless of operating system.

Trac field	Value
Version	7.7
Type	Bug
TypeOfFailure	OtherFailure
Priority	normal
Resolution	Unresolved
Component	Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system	Linux
Architecture	powerpc64

the 'impossible' happened : expectJust block_order

Child items ...

Activity