Optimize dataToTag# for small constructor families.
Currently we might have some code
\x -> dataToTag# x :: T -> Int#.
This evaluates the argument, and then executes a primop to get the tag of the argument.
This works and produces the Cmm code:
c1g4: // global call (I64[R1])(R1) returns to c1g3, args: 8, res: 8, upd: 8; c1g3: // global R1 = %MO_UU_Conv_W32_W64(I32[I64[R1 & (-8)] - 4]); Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
It follows the pointer to the closures, then follows the pointer to the info table and extracts the pointer.
However, for types with few data cons we don't need to, after evaluating the value we are guaranteed that the pointer will be tagged. This means we can construct the tag from the pointer alone.
GHC instead should check if the type is a type for which we can reconstruct the tag from the pointer. And do so if possible.
Possibly via a rewrite rule to a dataToTagSmall# primop or similar. This would save two memory accesses for dataToTag#