Skip to content

<= operators get compiled worse than ==

We'd like to define things like 'take' like this:

take :: Int -> [a] -> [a]
take n _ | n <= 0 = []
take _ []         = []
take n (x:xs)     = x : take (n-1) xs

Which is just the natural implementation from the H98 spec. Unfortunately to make it run fast it has to be defined like so:

take :: Int -> [a] -> [a]
take n _ | n <= 0 = []
take n ls = take' n ls
  where
    take' :: Int -> [a] -> [a]
    take' 0 _      = []
    take' _ []     = []
    take' n (x:xs) = x : take' (n-1) xs

The crucial difference is factoring out the i <= 0 test so that it is done just once and then in the loop only matching against exactly 0.

Let's look at the STG:

First for the fast version:

==================== STG syntax: ====================
Take.$wtake' =
    \r [ww_snF w_snH]
	case ww_snF of ds_snM {
	  __DEFAULT ->
	      case w_snH of wild_so1 {
		[] -> [] [];
		: x_snL xs_snP ->
		    let {
		      sat_snR =
			  \u []
			      case -# [ds_snM 1] of sat_snO { __DEFAULT -> Take.$wtake' sat_snO xs_snP; };
		    } in  : [x_snL sat_snR];
	      };
	  0 -> [] [];
	};

Take.take =
    \r [n_snV ds_so0]
	case n_snV of wild_so2 {
	  GHC.Base.I# x_snY ->
	      case <=# [x_snY 0] of wild1_so3 {
		GHC.Base.False -> Take.$wtake' x_snY ds_so0;
		GHC.Base.True -> [] [];
	      };
	};

So we see this gives us case _ of _ { __DEFAULT -> ...; 0 -> ...}

where as the simple version gives us:

Take.$wtake =
    \r [ww_sn1 w_sn3]
	case <=# [ww_sn1 0] of wild_snl {
	  GHC.Base.False ->
	      case w_sn3 of wild1_snm {
		[] -> [] [];
		: x_sn7 xs_sna ->
		    let {
		      sat_snc =
			  \u []
			      case -# [ww_sn1 1] of sat_sn9 { __DEFAULT -> Take.$wtake sat_sn9 xs_sna; };
		    } in  : [x_sn7 sat_snc];
	      };
	  GHC.Base.True -> [] [];
	};

Take.take =
    \r [w_sng w1_snk]
	case w_sng of w2_snn { GHC.Base.I# ww_snj -> Take.$wtake ww_snj w1_snk; };

Now we get case n <=# 0# of { True -> ...; False -> ...}. We might hope that this gives us equivalent code in the backend but sadly it doesn't, the simple version is slower.

We should be able to do better since at the cpu level calculating condition codes for == and <= is the same cost, just different instructions.

Trac metadata
Trac field Value
Version 6.6
Type Bug
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system Unknown
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information