Injective type families
This page summarizes the design behind injective type families (#6018). It is a work in progress. This page will evolve to reflect the progress made.
Person responsible for this page and the implementation: Jan Stolarek (just so you now who is meant by "I").
Forms of injectivity
The idea behind #6018 is to allow users to declare that a type family is injective. So far we have identified these forms of injectivity:
 Injectivity in all the arguments, where knowing the result (righthand side) of a type family determines all the arguments on the lefthand side. Examples:
typefamilyId a whereId a = a
typefamilyF a b c
typeinstanceFIntCharBool=BooltypeinstanceFCharBoolInt=InttypeinstanceFBoolIntChar=Char
 Injectivity in some of the arguments, where knowing the RHS of a type family determines only some of the arguments on the LHS. Example:
typefamilyG a b c
typeinstanceGIntCharBool=BooltypeinstanceGIntCharInt=BooltypeinstanceGBoolIntInt=Int
Here knowing the RHS allows us to determine first two arguments, but not the third one.
 Injectivity in some of the arguments, where knowing the RHS of a type family and some of the LHS arguments determines other (possibly not all) LHS arguments. Examples:
typefamilyPlus a b wherePlusZ n = n
Plus(S m ) n =S(Plus n m)
Here knowing the RHS and any single argument uniquely determines the remaining argument.
typefamilyH a b c
typeinstanceHIntCharDouble=InttypeinstanceHBoolDoubleDouble=Int
Knowing the RHS and either
a
orb
allows to uniquely determine the remaining two arguments, but knowing the RHS andc
gives us no information abouta
orb
.
In the following text I will refer to these three forms of injectivity as A, B and C.
Note that at the moment we only have practical use cases for injectivity of form A. Because of that I propose that we implement only this form of injectivity. We can implement B and C when somebody comes up with compelling use cases.
Proposed syntax
Two important things to consider when deciding on a concrete syntax are:

Backwards compatibility: at the moment no decision has been made on whether injectivity will become a part of existing
TypeFamilies
extension or if we create a newInjectiveTypeFamilies
extension that impliesTypeFamilies
. If we choose the former then we need to come up with syntax that is backwards compatible. (Perhaps this is the other way around: if we end up having backwards incompatible syntax then we must create a new language extension). 
Future extensibility: if we only implement A we still want to be able to add B and C in the future without breaking A.
We decided to use syntax similar to functional dependencies. The injectivity
declaration begins with 
following type family declaration head. 
is
followed by a list of commaseparated injectivity conditions. Each injectivity
condition has the form:
A>B
where A
and B
are nonempty lists of type variables declared in type family
head. Things on the left and right of >
are called LHS and RHS of an
injectivity condition, respectively. The user must be able to name the result of
a type family. To achieve that we extend syntax of type family head by allowing
to write = tyvar
or = (tyvar :: kind)
annotations in addition to already
allowed :: kind
annotation. In other words all these declaration are
wellformed:
typefamilyPlus(a ::Nat)(b ::Nat)where...typefamilyPlus(a ::Nat)(b ::Nat)::Natwhere...typefamilyPlus(a ::Nat)(b ::Nat)= c where...typefamilyPlus(a ::Nat)(b ::Nat)=(c ::Nat)where...
but the third or fourth form is required if the user wants to introduce injectivity declaration. Here are examples of injectivity declarations using proposed syntax:
typefamilyId a = result  result > a wheretypefamilyF a b c = d  d > a b c
typefamilyG a b c = foo  foo > a b wheretypefamilyPlus a b =(sum ::Nat) sum a > b, sum b > a wheretypefamilyH a b c = xyz  xyz a > b c, xyz b > a c
Note that above examples demonstrate all three types of injectivivity but for now we only plan to implement injectivity A.
Pros:

natural reading:
res a > b
reads as "knowing res and a determines b". 
extensible for the future

backwards compatible
Cons:
 since we only plan to implement injectivity of form A we require that there is only one injectivity condition, its LHS contains only a type variable naming the result and its RHS contains all type variables naming the arguments to a type family. This might be a bit annoying.
Reallife use cases
If you can supply more examples (type family declarations + their usage that currently doesn't work but should work with injectivity) please add them here.
Example 1
typefamilyF a = result  result > a whereFChar=BoolFBool=CharF a = a
idChar::(F a ~Bool)=> a >CharidChar a = a
idDouble::(F b ~Double)=> b >DoubleidDouble a = a
GHC should infer that a
is in fact Char
and b
is Double
. Right now this program is rejected.
Example 2 (taken from 6018#comment:5)
dataNat=ZeroSucNatdataCoNat=CoNatInfinitytypefamilySucc(t ::CoNat)::CoNattypeinstanceSucc(Co n)=Co(Suc n)typeinstanceSuccInfinity=Infinity
GHC can't infer Succ n ~ Succ m => n ~ m
because it can't see that Succ
is injective.
Example 3 (taken from 6018#comment:26)
classManifold' a wheretypeField a
typeBase a
typeTangent a
typeTangentBundle a
typeDimension a ::NattypeUsesMetric a ::Symbol
project :: a >Base a
unproject ::Base a > a
tangent :: a >TangentBundle a
cotangent :: a >(TangentBundle a >Field a) this worksid':: forall a.(Manifold' a )=>Base a >Base a
id' input = project out
where
out :: a
out = unproject input
 but this requires injective type familiesid':: forall a.(Manifold' a )=>Base a >Base a
id'= project . unproject
Implementation outline
Verifying correctness of user's injectivity declaration
Once the user declares type family to be injective GHC must verify that this declaration is correct, ie. type family really is injective. Below is an outline of the algorithm. It is then follow with a discussion and rationale. Note: I am not too familiar with GHC's type checking so please tell me if I'm making any wrong assumptions here. Also, I might be using incorrect terminology. If so please complain or fix.
Important: this algorithm is only for the injectivity of type A. RHS refers to the righthand side of the type family being checked for injectivity. LHS refers to the arguments of that type family.
Algorithm
 If a RHS contains a call to a type family we conclude that the type family is not injective. I am not certain if this extends to selfrecursion  see discussion below.
Open type families:
 When checking equation of an open type family we try to unify its RHS with RHSs of all equations we've seen so far. If we succeed then LHSs of unified equations must be identical under that substitution. If they are not identical then we declare that a type family is not injective.
RAE: I believe this would also have to check that all variables mentioned in the LHS are mentioned in the RHS. Or, following along with Simon's idea in comment:76:ticket:6018, those variables that appear in the injectivity annotation must be mentioned in the RHS. End RAE
Closed type families

If a type family has only one equation we declare it to be injective (unless case 1 above applies). RAE or there are variables missing from the RHS End RAE

If a type family has more equations then we check them one by one. Checking each equation consists of these steps:
 verify 1) for the current equation.
a) attempt to unify RHS with RHSs of all previous equations. If this does not succeed proceed to next equation. If this was the last equation then type family is injective.
b) if unification in a) succeeds and no substitutions are required then type family is not injective and we report an error to the user. By "no substitution" I mean situation when there are no type variables involved ie. both sides declare the same concrete type (eg. Int). RAE This seems to be a straightforward special case of (c). Omit? End RAE
c) if unification succeeds and there are type variables involved we substitute unified type variables on the LHS and check whether this LHS overlaps with any of the previous equations. If it does we proceed to the next equation (if this was the last equation then type family is injective). If it doesn't then type family is not injective and we report an error to the user. See examples and discussion below for more details.
RAE: We have to be careful with the word overlap here. I think we want "overlaps" to be "is subsumed by".
Even subtler, it's possible that certain values for type variables are excluded if the current LHS is reachable (for example, some variable a
couldn't be Int
, because if a
were Int
, then a previous equation would have triggered). Perhaps these exclusions can also be taken into account. Thankfully, forgetting about the exclusions is conservative  the exclusions can only make more families injective, not fewer. End RAE
RAE's examples
typefamilyE2(a ::Bool)= r  r > a whereE2False=TrueE2True=FalseE2 a =False
A naive injectivity check here would conclude that E2
is not injective, because the RHS of the last equation unifies with the RHS of a previous equation, and the LHS of the last equation is not subsumed by any previous equation. But, noting that a
(as used in the last equation) can neither be True
nor False
, `E2 is in fact injective. These are the "exclusions" I talk about above.
Discussion
Case 1 above is conservative and may incorrectly reject injectivity declaration
for some type families. For example type family I
:
typefamilyI a = r  r > a whereI a =Id a  Id defined earlier
is injective but restriction in case 1 will reject it. Why such a restriction?
Consider this example. Let us assume that we have type families X
and Y
that
we already know are injective, and type constructors Bar :: * > *
and
Baz :: * > * > *
. Now we declare:
typefamilyFoo a = r  r > a whereFoo(Bar a)=X a
Foo(Baz a b)=Y a b
Here we would need to check whether results of X a
and Y a b
can overlap and
I don't see a good way of doing this once the RHS has nested calls to type
families. Second of all if we see Foo a ~ Bool
during type checking GHC can't
solve that without trying each equation one by one to see which one returns
Bool
. This takes a lot of guessing and I believe GHC should refuse to do that.
See also #4259.
What about selfrecursion? Consider this type family:
 assumes promoted Maybe and NattypefamilyNatToMaybe a = r  r > a whereNatToMaybeZ=NothingNatToMaybe(S n)=Just(NatToMaybe n)
Using Case 4a we will infer correctly that this type family is injective. Now consider this:
typefamilyBan a = r  r > a whereBanZ=ZBan(S n)=Ban n
To verify injectivity in this case we need to ask whether Z ~ Ban n
(case 4a
above). We can only answer such question when a type family is injective. Here
we have not confirmed that Ban
is injective so we may rightly refuse to answer
Z ~ Ban n
and conclude (correctly) that this type family is not injective. So it
seems to me that we could allow selfrecursion  I have not yet identified any
corner cases that would prevent us from doing so.
RAE: But it seems that, under this treatment, any selfrecursion would automatically lead to a conclusion of "not injective", just like any other use of a type family. For example:
typefamilyIdNat n = r  r > a whereIdNatZ=ZIdNat(S n)=S(IdNat n)
IdNat
is injective. But, following the algorithm above, GHC would see the recursive use of IdNat
, not know whether IdNat
is injective, and then give up, concluding "not injective". Is there a case where the special treatment of selfrecursion leads to a conclusion of "injective"? End RAE
Here's an example of case 4c in action:
typefamilyBak a = r  r > a whereBakInt=CharBakChar=IntBak a = a
Equation 2 falls under case 4a. When we reach equation 3 we attempt to verify
it's RHS with each RHS of the previous equations. Attempt to unify a
with
Char
succeeds. Since unification was successful we substitute Char
on the LHS
of the 3rd equation and get Bak Char
. Now we must check if this equation
overlaps with any of the previous equations. It does overlap with the second
one, so everything is fine: we know that 3rd equation will never return a Char
(like 1st equation) because this case will be caught by the 2nd equation. We
know repeat the same steps for the second equations and conclude that 3rd
equation will never produce an Int
because this will be caught by the 1st
equation. We thus conclude that Bak
is injective.
Other implementation details
The implementation needs to check the correctness of injectivity conditions declarations. This includes checking that:

only inscope type variables are used. For example
type family F a  result > b
should result with "not in scope: b" error. 
there are no identical conditions (this wouldn't hurt, but the user deserves a warning about this)

type variables are not repeated on either LHS or RHS of the injectivity condition. For example
result a a > ...
or... > a b a
should generate a warning. Note that it probably is OK to have the same variable both on the LHS and RHS of an injectivity condition: in the above examples it is true thattype family G a b c  result c > a b c
. The question is whether this has any practical relevance. 
injectivity conditions don't overlap (eg.
result > a b
overlapsresult > a
). This probably deserves a warning.
I am not certain at the moment how to treat these injectivity conditions declarations:

result > a, result > b
is technically correct but we could just sayresult > a b
. Do the two separate declarations have the same power as the combined one?
Inferring injectivity
Here it was suggested by Simon that we could infer injectivity for closed type families. This is certainly possible for injectivity A, but not for B or C, where the number of possible injectivity conditions is exponential in the number of arguments. I personally am against inferring injectivity. The reasons are:

Before 7.10 comes out there will already be some code in the wild that uses closed type families introduced in GHC 7.8. None of these type families require injectivity to work because GHC 7.8 does not support injectivity. If we attempt to infer injectivity for all these already existing closed type families we will only increase compilation time of existing code with absolutely no gain in functionality of the code. There were some complaints about GHC's performance decreasing with each release and I don't want to add to that.

I believe that requiring explicit injectivity annotations is a valuable source code documentation for the programmer.

Annotations also make it explicit which code will compile with GHC 7.10 and which will not.

I don't like the idea of mismatch between open type families and closed type families, meaning that injectivity of open type families would be openly documented whereas for closed type families it would be hidden.
Other notes

This page does not mention anything about associated types. The intention is that injectivity will also work for those. I just haven't thought about the details yet.

Here Richard mentions headinjectivity used by Agda. We don't plan to follow that direction.

I'm not yet sure what is the exact scope of things we want to do with injectivity. Certainly if we have
F a ~ F b
andF
is injective then we can declarea ~ b
. (Note: Simon remarks that in such casea
must be a skolem constant andb
a unification variable. I don't know the difference between the two). Moreover if we haveF a ~ Int
then we can infera
. But what if we also have injective type familyG
and see a constraint likeF a ~ G b c
? For injective type families there exists at most one solution but I think that GHC should refuse to solve such riddles.