Commit ce136f8b authored by simonpj's avatar simonpj
Browse files

[project @ 2003-07-28 13:31:37 by simonpj]

Reorganise the type-system-extension part of GlaExts docs
parent ea067762
......@@ -774,7 +774,11 @@ This name is not supported by GHC.
<sect1 id="type-extensions">
<title>Type system extensions</title>
<sect2 id="nullary-types">
<sect2>
<title>Data types and type synonyms</title>
<sect3 id="nullary-types">
<title>Data types with no constructors</title>
<para>With the <option>-fglasgow-exts</option> flag, GHC lets you declare
......@@ -792,9 +796,9 @@ not <literal>*</literal> then an explicit kind annotation must be used
<para>Such data types have only one value, namely bottom.
Nevertheless, they can be useful when defining "phantom types".</para>
</sect2>
</sect3>
<sect2 id="infix-tycons">
<sect3 id="infix-tycons">
<title>Infix type constructors</title>
<para>
......@@ -845,1746 +849,1703 @@ like expressions. More specifically:
</itemizedlist>
</para>
</sect2>
</sect3>
<sect2 id="sec-kinding">
<title>Explicitly-kinded quantification</title>
<sect3 id="type-synonyms">
<title>Liberalised type synonyms</title>
<para>
Haskell infers the kind of each type variable. Sometimes it is nice to be able
to give the kind explicitly as (machine-checked) documentation,
just as it is nice to give a type signature for a function. On some occasions,
it is essential to do so. For example, in his paper "Restricted Data Types in Haskell" (Haskell Workshop 1999)
John Hughes had to define the data type:
<Screen>
data Set cxt a = Set [a]
| Unused (cxt a -> ())
</Screen>
The only use for the <literal>Unused</literal> constructor was to force the correct
kind for the type variable <literal>cxt</literal>.
</para>
<para>
GHC now instead allows you to specify the kind of a type variable directly, wherever
a type variable is explicitly bound. Namely:
Type synonmys are like macros at the type level, and
GHC does validity checking on types <emphasis>only after expanding type synonyms</emphasis>.
That means that GHC can be very much more liberal about type synonyms than Haskell 98:
<itemizedlist>
<listitem><para><literal>data</literal> declarations:
<Screen>
data Set (cxt :: * -> *) a = Set [a]
</Screen></para></listitem>
<listitem><para><literal>type</literal> declarations:
<Screen>
type T (f :: * -> *) = f Int
</Screen></para></listitem>
<listitem><para><literal>class</literal> declarations:
<Screen>
class (Eq a) => C (f :: * -> *) a where ...
</Screen></para></listitem>
<listitem><para><literal>forall</literal>'s in type signatures:
<Screen>
f :: forall (cxt :: * -> *). Set cxt Int
</Screen></para></listitem>
</itemizedlist>
</para>
<listitem> <para>You can write a <literal>forall</literal> (including overloading)
in a type synonym, thus:
<programlisting>
type Discard a = forall b. Show b => a -> b -> (a, String)
<para>
The parentheses are required. Some of the spaces are required too, to
separate the lexemes. If you write <literal>(f::*->*)</literal> you
will get a parse error, because "<literal>::*->*</literal>" is a
single lexeme in Haskell.
</para>
f :: Discard a
f x y = (x, show y)
<para>
As part of the same extension, you can put kind annotations in types
as well. Thus:
<Screen>
f :: (Int :: *) -> Int
g :: forall a. a -> (a :: *)
</Screen>
The syntax is
<Screen>
atype ::= '(' ctype '::' kind ')
</Screen>
The parentheses are required.
g :: Discard Int -> (Int,Bool) -- A rank-2 type
g f = f Int True
</programlisting>
</para>
</sect2>
</listitem>
<listitem><para>
You can write an unboxed tuple in a type synonym:
<programlisting>
type Pr = (# Int, Int #)
<sect2 id="class-method-types">
<title>Class method types
</title>
<para>
Haskell 98 prohibits class method types to mention constraints on the
class type variable, thus:
h :: Int -> Pr
h x = (# x, x #)
</programlisting>
</para></listitem>
<listitem><para>
You can apply a type synonym to a forall type:
<programlisting>
class Seq s a where
fromList :: [a] -> s a
elem :: Eq a => a -> s a -> Bool
type Foo a = a -> a -> Bool
f :: Foo (forall b. b->b)
</programlisting>
The type of <literal>elem</literal> is illegal in Haskell 98, because it
contains the constraint <literal>Eq a</literal>, constrains only the
class type variable (in this case <literal>a</literal>).
</para>
<para>
With the <option>-fglasgow-exts</option> GHC lifts this restriction.
</para>
After expanding the synonym, <literal>f</literal> has the legal (in GHC) type:
<programlisting>
f :: (forall b. b->b) -> (forall b. b->b) -> Bool
</programlisting>
</para></listitem>
</sect2>
<listitem><para>
You can apply a type synonym to a partially applied type synonym:
<programlisting>
type Generic i o = forall x. i x -> o x
type Id x = x
foo :: Generic Id []
</programlisting>
After epxanding the synonym, <literal>foo</literal> has the legal (in GHC) type:
<programlisting>
foo :: forall x. x -> [x]
</programlisting>
</para></listitem>
<sect2 id="multi-param-type-classes">
<title>Multi-parameter type classes
</title>
</itemizedlist>
</para>
<para>
This section documents GHC's implementation of multi-parameter type
classes. There's lots of background in the paper <ULink
URL="http://research.microsoft.com/~simonpj/multi.ps.gz" >Type
classes: exploring the design space</ULink > (Simon Peyton Jones, Mark
Jones, Erik Meijer).
GHC currently does kind checking before expanding synonyms (though even that
could be changed.)
</para>
<sect3 id="type-restrictions">
<title>Types</title>
<para>
GHC imposes the following restrictions on the form of a qualified
type, whether declared in a type signature
or inferred. Consider the type:
After expanding type synonyms, GHC does validity checking on types, looking for
the following mal-formedness which isn't detected simply by kind checking:
<itemizedlist>
<listitem><para>
Type constructor applied to a type involving for-alls.
</para></listitem>
<listitem><para>
Unboxed tuple on left of an arrow.
</para></listitem>
<listitem><para>
Partially-applied type synonym.
</para></listitem>
</itemizedlist>
So, for example,
this will be rejected:
<programlisting>
forall tv1..tvn (c1, ...,cn) => type
</programlisting>
type Pr = (# Int, Int #)
(Here, I write the "foralls" explicitly, although the Haskell source
language omits them; in Haskell 1.4, all the free type variables of an
explicit source-language type signature are universally quantified,
except for the class type variables in a class declaration. However,
in GHC, you can give the foralls if you want. See <xref LinkEnd="universal-quantification">).
h :: Pr -> Int
h x = ...
</programlisting>
because GHC does not allow unboxed tuples on the left of a function arrow.
</para>
</sect3>
<para>
<OrderedList>
<listitem>
<sect3 id="existential-quantification">
<title>Existentially quantified data constructors
</title>
<para>
<emphasis>Each universally quantified type variable
<literal>tvi</literal> must be reachable from <literal>type</literal></emphasis>.
A type variable is "reachable" if it it is functionally dependent
(see <xref linkend="functional-dependencies">)
on the type variables free in <literal>type</literal>.
The reason for this is that a value with a type that does not obey
this restriction could not be used without introducing
ambiguity.
Here, for example, is an illegal type:
The idea of using existential quantification in data type declarations
was suggested by Laufer (I believe, thought doubtless someone will
correct me), and implemented in Hope+. It's been in Lennart
Augustsson's <Command>hbc</Command> Haskell compiler for several years, and
proved very useful. Here's the idea. Consider the declaration:
</para>
<para>
<programlisting>
forall a. Eq a => Int
data Foo = forall a. MkFoo a (a -> Bool)
| Nil
</programlisting>
When a value with this type was used, the constraint <literal>Eq tv</literal>
would be introduced where <literal>tv</literal> is a fresh type variable, and
(in the dictionary-translation implementation) the value would be
applied to a dictionary for <literal>Eq tv</literal>. The difficulty is that we
can never know which instance of <literal>Eq</literal> to use because we never
get any more information about <literal>tv</literal>.
</para>
</listitem>
<listitem>
<para>
<emphasis>Every constraint <literal>ci</literal> must mention at least one of the
universally quantified type variables <literal>tvi</literal></emphasis>.
For example, this type is OK because <literal>C a b</literal> mentions the
universally quantified type variable <literal>b</literal>:
The data type <literal>Foo</literal> has two constructors with types:
</para>
<para>
<programlisting>
forall a. C a b => burble
MkFoo :: forall a. a -> (a -> Bool) -> Foo
Nil :: Foo
</programlisting>
</para>
The next type is illegal because the constraint <literal>Eq b</literal> does not
mention <literal>a</literal>:
<para>
Notice that the type variable <literal>a</literal> in the type of <function>MkFoo</function>
does not appear in the data type itself, which is plain <literal>Foo</literal>.
For example, the following expression is fine:
</para>
<para>
<programlisting>
forall a. Eq b => burble
[MkFoo 3 even, MkFoo 'c' isUpper] :: [Foo]
</programlisting>
The reason for this restriction is milder than the other one. The
excluded types are never useful or necessary (because the offending
context doesn't need to be witnessed at this point; it can be floated
out). Furthermore, floating them out increases sharing. Lastly,
excluding them is a conservative choice; it leaves a patch of
territory free in case we need it later.
</para>
</listitem>
</OrderedList>
<para>
Here, <literal>(MkFoo 3 even)</literal> packages an integer with a function
<function>even</function> that maps an integer to <literal>Bool</literal>; and <function>MkFoo 'c'
isUpper</function> packages a character with a compatible function. These
two things are each of type <literal>Foo</literal> and can be put in a list.
</para>
<para>
Unlike Haskell 1.4, constraints in types do <emphasis>not</emphasis> have to be of
the form <emphasis>(class type-variables)</emphasis>. Thus, these type signatures
are perfectly OK
What can we do with a value of type <literal>Foo</literal>?. In particular,
what happens when we pattern-match on <function>MkFoo</function>?
</para>
<para>
<programlisting>
f :: Eq (m a) => [m a] -> [m a]
g :: Eq [a] => ...
f (MkFoo val fn) = ???
</programlisting>
</para>
<para>
This choice recovers principal types, a property that Haskell 1.4 does not have.
Since all we know about <literal>val</literal> and <function>fn</function> is that they
are compatible, the only (useful) thing we can do with them is to
apply <function>fn</function> to <literal>val</literal> to get a boolean. For example:
</para>
</sect3>
<sect3>
<title>Class declarations</title>
<para>
<OrderedList>
<listitem>
<para>
<emphasis>Multi-parameter type classes are permitted</emphasis>. For example:
<programlisting>
class Collection c a where
union :: c a -> c a -> c a
...etc.
f :: Foo -> Bool
f (MkFoo val fn) = fn val
</programlisting>
</para>
</listitem>
<listitem>
<para>
<emphasis>The class hierarchy must be acyclic</emphasis>. However, the definition
of "acyclic" involves only the superclass relationships. For example,
this is OK:
<programlisting>
class C a where {
op :: D b => a -> b -> b
}
class C a => D a where { ... }
</programlisting>
What this allows us to do is to package heterogenous values
together with a bunch of functions that manipulate them, and then treat
that collection of packages in a uniform manner. You can express
quite a bit of object-oriented-like programming this way.
</para>
Here, <literal>C</literal> is a superclass of <literal>D</literal>, but it's OK for a
class operation <literal>op</literal> of <literal>C</literal> to mention <literal>D</literal>. (It
would not be OK for <literal>D</literal> to be a superclass of <literal>C</literal>.)
<sect4 id="existential">
<title>Why existential?
</title>
<para>
What has this to do with <emphasis>existential</emphasis> quantification?
Simply that <function>MkFoo</function> has the (nearly) isomorphic type
</para>
</listitem>
<listitem>
<para>
<emphasis>There are no restrictions on the context in a class declaration
(which introduces superclasses), except that the class hierarchy must
be acyclic</emphasis>. So these class declarations are OK:
<programlisting>
class Functor (m k) => FiniteMap m k where
...
class (Monad m, Monad (t m)) => Transform t m where
lift :: m a -> (t m) a
MkFoo :: (exists a . (a, a -> Bool)) -> Foo
</programlisting>
</para>
</listitem>
<listitem>
<para>
<emphasis>All of the class type variables must be reachable (in the sense
mentioned in <xref linkend="type-restrictions">)
from the free varibles of each method type
</emphasis>. For example:
<programlisting>
class Coll s a where
empty :: s
insert :: s -> a -> s
</programlisting>
But Haskell programmers can safely think of the ordinary
<emphasis>universally</emphasis> quantified type given above, thereby avoiding
adding a new existential quantification construct.
</para>
</sect4>
is not OK, because the type of <literal>empty</literal> doesn't mention
<literal>a</literal>. This rule is a consequence of Rule 1(a), above, for
types, and has the same motivation.
<sect4>
<title>Type classes</title>
Sometimes, offending class declarations exhibit misunderstandings. For
example, <literal>Coll</literal> might be rewritten
<para>
An easy extension (implemented in <Command>hbc</Command>) is to allow
arbitrary contexts before the constructor. For example:
</para>
<para>
<programlisting>
class Coll s a where
empty :: s a
insert :: s a -> a -> s a
data Baz = forall a. Eq a => Baz1 a a
| forall b. Show b => Baz2 b (b -> b)
</programlisting>
</para>
which makes the connection between the type of a collection of
<literal>a</literal>'s (namely <literal>(s a)</literal>) and the element type <literal>a</literal>.
Occasionally this really doesn't work, in which case you can split the
class like this:
<para>
The two constructors have the types you'd expect:
</para>
<para>
<programlisting>
class CollE s where
empty :: s
class CollE s => Coll s a where
insert :: s -> a -> s
Baz1 :: forall a. Eq a => a -> a -> Baz
Baz2 :: forall b. Show b => b -> (b -> b) -> Baz
</programlisting>
</para>
</listitem>
</OrderedList>
</para>
</sect3>
<sect3 id="instance-decls">
<title>Instance declarations</title>
<para>
<OrderedList>
<listitem>
But when pattern matching on <function>Baz1</function> the matched values can be compared
for equality, and when pattern matching on <function>Baz2</function> the first matched
value can be converted to a string (as well as applying the function to it).
So this program is legal:
</para>
<para>
<emphasis>Instance declarations may not overlap</emphasis>. The two instance
declarations
<programlisting>
instance context1 => C type1 where ...
instance context2 => C type2 where ...
f :: Baz -> String
f (Baz1 p q) | p == q = "Yes"
| otherwise = "No"
f (Baz2 v fn) = show (fn v)
</programlisting>
"overlap" if <literal>type1</literal> and <literal>type2</literal> unify
However, if you give the command line option
<option>-fallow-overlapping-instances</option><indexterm><primary>-fallow-overlapping-instances
option</primary></indexterm> then overlapping instance declarations are permitted.
However, GHC arranges never to commit to using an instance declaration
if another instance declaration also applies, either now or later.
<itemizedlist>
<listitem>
</para>
<para>
EITHER <literal>type1</literal> and <literal>type2</literal> do not unify
Operationally, in a dictionary-passing implementation, the
constructors <function>Baz1</function> and <function>Baz2</function> must store the
dictionaries for <literal>Eq</literal> and <literal>Show</literal> respectively, and
extract it on pattern matching.
</para>
</listitem>
<listitem>
<para>
OR <literal>type2</literal> is a substitution instance of <literal>type1</literal>
(but not identical to <literal>type1</literal>), or vice versa.
Notice the way that the syntax fits smoothly with that used for
universal quantification earlier.
</para>
</listitem>
</itemizedlist>
Notice that these rules
<itemizedlist>
<listitem>
<para>
make it clear which instance decl to use
(pick the most specific one that matches)
</sect4>
</para>
</listitem>
<listitem>
<sect4>
<title>Restrictions</title>
<para>
do not mention the contexts <literal>context1</literal>, <literal>context2</literal>
Reason: you can pick which instance decl
"matches" based on the type.
There are several restrictions on the ways in which existentially-quantified
constructors can be use.
</para>
</listitem>
</itemizedlist>
However the rules are over-conservative. Two instance declarations can overlap,
but it can still be clear in particular situations which to use. For example:
<programlisting>
instance C (Int,a) where ...
instance C (a,Bool) where ...
</programlisting>
These are rejected by GHC's rules, but it is clear what to do when trying
to solve the constraint <literal>C (Int,Int)</literal> because the second instance
cannot apply. Yell if this restriction bites you.
</para>
<para>
GHC is also conservative about committing to an overlapping instance. For example:
<programlisting>
class C a where { op :: a -> a }
instance C [Int] where ...
instance C a => C [a] where ...
f :: C b => [b] -> [b]
f x = op x
</programlisting>
From the RHS of f we get the constraint <literal>C [b]</literal>. But
GHC does not commit to the second instance declaration, because in a paricular
call of f, b might be instantiate to Int, so the first instance declaration
would be appropriate. So GHC rejects the program. If you add <option>-fallow-incoherent-instances</option>
GHC will instead silently pick the second instance, without complaining about
the problem of subsequent instantiations.
</para>
<para>
Regrettably, GHC doesn't guarantee to detect overlapping instance
declarations if they appear in different modules. GHC can "see" the
instance declarations in the transitive closure of all the modules
imported by the one being compiled, so it can "see" all instance decls
when it is compiling <literal>Main</literal>. However, it currently chooses not
to look at ones that can't possibly be of use in the module currently
being compiled, in the interests of efficiency. (Perhaps we should
change that decision, at least for <literal>Main</literal>.)
</para>
</listitem>
<itemizedlist>
<listitem>
<para>
<emphasis>There are no restrictions on the type in an instance
<emphasis>head</emphasis>, except that at least one must not be a type variable</emphasis>.
The instance "head" is the bit after the "=>" in an instance decl. For
example, these are OK:
When pattern matching, each pattern match introduces a new,
distinct, type for each existential type variable. These types cannot
be unified with any other type, nor can they escape from the scope of
the pattern match. For example, these fragments are incorrect:
<programlisting>
instance C Int a where ...