Commit e4d057ec authored by chak's avatar chak
Browse files

[project @ 2005-05-04 15:19:37 by chak]

Updated and extended the section about the renamer.
parent ebcd8c64
......@@ -6,7 +6,7 @@
</head>
<body BGCOLOR="FFFFFF">
<h1>The Glasgow Haskell Compiler (GHC) Commentary [v0.15]</h1>
<h1>The Glasgow Haskell Compiler (GHC) Commentary [v0.16]</h1>
<p>
<!-- Contributors: Whoever makes substantial additions or changes to the
document, please add your name and keep the order alphabetic. Moreover,
......@@ -113,7 +113,7 @@
<p><small>
<!-- hhmts start -->
Last modified: Sat Mar 5 19:52:33 EST 2005
Last modified: Wed May 4 11:48:54 EST 2005
<!-- hhmts end -->
</small>
</body>
......
......@@ -8,133 +8,160 @@
<body BGCOLOR="FFFFFF">
<h1>The GHC Commentary - The truth about names: OccNames, and Names</h1>
<p>
Every entity (type constructor, class, identifier, type variable) has
a <code>Name</code>. The <code>Name</code> type is pervasive in GHC,
and is defined in <code>basicTypes/Name.lhs</code>. Here is what a Name looks like,
though it is private to the Name module.
<pre>
data Name = Name {
n_sort :: NameSort, -- What sort of name it is
n_occ :: !OccName, -- Its occurrence name
n_uniq :: Unique, -- Its identity
n_loc :: !SrcLoc -- Definition site
}
</pre>
<ul>
<li> The <code>n_sort</code> field says what sort of name this is: see
<a href="#sort">NameSort below</a>.
<li> The <code>n_occ</code> field gives the "occurrence name" of the Name; see
<a href="#occname">OccName below</a>.
<li> The <code>n_uniq</code> field allows fast tests for equality of Names.
<li> The <code>n_loc</code> field gives some indication of where the name was bound.
</ul>
<h2><a name="sort">The <code>NameSort</code> of a <code>Name</code></a></h2>
There are three flavours of <code>Name</code>:
<pre>
data NameSort
= External Module
| Internal
| System
</pre>
<ul>
<li> Here are the sorts of Name an entity can have:
<ul>
<li> Class, TyCon: External.
<li> Id: External, Internal, or System.
<li> TyVar: Internal, or System.
</ul>
<p><li> An <code>ExternalName</code> has a globally-unique
(module name,occurrence name) pair, namely the
<em>original name</em> of the entity,
describing where the thing was originally defined. So for example,
if we have
<pre>
module M where
f = e1
g = e2
module A where
import qualified M as Q
import M
a = Q.f + g
</pre>
then the RdrNames for "a", "Q.f" and "g" get replaced (by the Renamer)
by the Names "A.a", "M.f", and "M.g" respectively.
<p><li> An <code>InternalName</code>
has only an occurrence name. Distinct InternalNames may have the same occurrence
name; use the Unique to distinguish them.
<p> <li> An <code>ExternalName</code> has a unique that never changes. It is never
cloned. This is important, because the simplifier invents new names pretty freely,
but we don't want to lose the connnection with the type environment (constructed earlier).
An <code>InternalName</code> name can be cloned freely.
<p><li> <strong>Before CoreTidy</strong>: the Ids that were defined at top level
in the original source program get <code>ExternalNames</code>, whereas extra
top-level bindings generated (say) by the type checker get <code>InternalNames</code>.
This distinction is occasionally useful for filtering diagnostic output; e.g.
for -ddump-types.
<p><li> <strong>After CoreTidy</strong>: An Id with an <code>ExternalName</code> will generate symbols that
appear as external symbols in the object file. An Id with an <code>InternalName</code>
cannot be referenced from outside the module, and so generates a local symbol in
the object file. The CoreTidy pass makes the decision about which names should
be External and which Internal.
<p><li> A <code>System</code> name is for the most part the same as an
<code>Internal</code>. Indeed, the differences are purely cosmetic:
<ul>
<li>Internal names usually come from some name the
user wrote, whereas a System name has an OccName like "a", or "t". Usually
there are masses of System names with the same OccName but different uniques,
whereas typically there are only a handful of distince Internal names with the same
OccName.
<li>
Another difference is that when unifying the type checker tries to
unify away type variables with System names, leaving ones with Internal names
(to improve error messages).
</ul>
</ul>
<h2> <a name="occname">Occurrence names: <code>OccName</code></a> </h2>
An <code>OccName</code> is more-or-less just a string, like "foo" or "Tree",
giving the (unqualified) name of an entity.
Well, not quite just a string, because in Haskell a name like "C" could mean a type
constructor or data constructor, depending on context. So GHC defines a type
<tt>OccName</tt> (defined in <tt>basicTypes/OccName.lhs</tt>) that is a pair of
a <tt>FastString</tt> and a <tt>NameSpace</tt> indicating which name space the
name is drawn from:
<pre>
data OccName = OccName NameSpace EncodedFS
</pre>
The <tt>EncodedFS</tt> is a synonym for <tt>FastString</tt> indicating that the
string is Z-encoded. (Details in <tt>OccName.lhs</tt>.) Z-encoding encodes
funny characters like '%' and '$' into alphabetic characters, like "zp" and "zd",
so that they can be used in object-file symbol tables without confusing linkers
and suchlike.
<p>
The name spaces are:
<ul>
<li> <tt>VarName</tt>: ordinary variables
<li> <tt>TvName</tt>: type variables
<li> <tt>DataName</tt>: data constructors
<li> <tt>TcClsName</tt>: type constructors and classes (in Haskell they share a name space)
</ul>
Every entity (type constructor, class, identifier, type variable) has a
<code>Name</code>. The <code>Name</code> type is pervasive in GHC, and
is defined in <code>basicTypes/Name.lhs</code>. Here is what a Name
looks like, though it is private to the Name module.
</p>
<blockquote>
<pre>
data Name = Name {
n_sort :: NameSort, -- What sort of name it is
n_occ :: !OccName, -- Its occurrence name
n_uniq :: Unique, -- Its identity
n_loc :: !SrcLoc -- Definition site
}</pre>
</blockquote>
<ul>
<li> The <code>n_sort</code> field says what sort of name this is: see
<a href="#sort">NameSort below</a>.
<li> The <code>n_occ</code> field gives the "occurrence name" of the
Name; see
<a href="#occname">OccName below</a>.
<li> The <code>n_uniq</code> field allows fast tests for equality of
Names.
<li> The <code>n_loc</code> field gives some indication of where the
name was bound.
</ul>
<h2><a name="sort">The <code>NameSort</code> of a <code>Name</code></a></h2>
<p>
There are four flavours of <code>Name</code>:
</p>
<blockquote>
<pre>
data NameSort
= External Module (Maybe Name)
-- (Just parent) => this Name is a subordinate name of 'parent'
-- e.g. data constructor of a data type, method of a class
-- Nothing => not a subordinate
| WiredIn Module (Maybe Name) TyThing BuiltInSyntax
-- A variant of External, for wired-in things
| Internal -- A user-defined Id or TyVar
-- defined in the module being compiled
| System -- A system-defined Id or TyVar. Typically the
-- OccName is very uninformative (like 's')</pre>
</blockquote>
<ul>
<li>Here are the sorts of Name an entity can have:
<ul>
<li> Class, TyCon: External.
<li> Id: External, Internal, or System.
<li> TyVar: Internal, or System.
</ul>
</li>
<li>An <code>External</code> name has a globally-unique
(module name, occurrence name) pair, namely the
<em>original name</em> of the entity,
describing where the thing was originally defined. So for example,
if we have
<blockquote>
<pre>
module M where
f = e1
g = e2
module A where
import qualified M as Q
import M
a = Q.f + g</pre>
</blockquote>
<p>
then the RdrNames for "a", "Q.f" and "g" get replaced (by the
Renamer) by the Names "A.a", "M.f", and "M.g" respectively.
</p>
</li>
<li>An <code>InternalName</code>
has only an occurrence name. Distinct InternalNames may have the same
occurrence name; use the Unique to distinguish them.
</li>
<li>An <code>ExternalName</code> has a unique that never changes. It
is never cloned. This is important, because the simplifier invents
new names pretty freely, but we don't want to lose the connnection
with the type environment (constructed earlier). An
<code>InternalName</code> name can be cloned freely.
</li>
<li><strong>Before CoreTidy</strong>: the Ids that were defined at top
level in the original source program get <code>ExternalNames</code>,
whereas extra top-level bindings generated (say) by the type checker
get <code>InternalNames</code>. q This distinction is occasionally
useful for filtering diagnostic output; e.g. for -ddump-types.
</li>
<li><strong>After CoreTidy</strong>: An Id with an
<code>ExternalName</code> will generate symbols that
appear as external symbols in the object file. An Id with an
<code>InternalName</code> cannot be referenced from outside the
module, and so generates a local symbol in the object file. The
CoreTidy pass makes the decision about which names should be External
and which Internal.
</li>
<li>A <code>System</code> name is for the most part the same as an
<code>Internal</code>. Indeed, the differences are purely cosmetic:
<ul>
<li>Internal names usually come from some name the
user wrote, whereas a System name has an OccName like "a", or "t".
Usually there are masses of System names with the same OccName but
different uniques, whereas typically there are only a handful of
distince Internal names with the same OccName.
</li>
<li>Another difference is that when unifying the type checker tries
to unify away type variables with System names, leaving ones with
Internal names (to improve error messages).
</li>
</ul>
</li>
</ul>
<h2><a name="occname">Occurrence names: <code>OccName</code></a></h2>
<p>
An <code>OccName</code> is more-or-less just a string, like "foo" or
"Tree", giving the (unqualified) name of an entity.
</p>
<p>
Well, not quite just a string, because in Haskell a name like "C" could
mean a type constructor or data constructor, depending on context. So
GHC defines a type <tt>OccName</tt> (defined in
<tt>basicTypes/OccName.lhs</tt>) that is a pair of a <tt>FastString</tt>
and a <tt>NameSpace</tt> indicating which name space the name is drawn
from:
<blockquote>
<pre>
data OccName = OccName NameSpace EncodedFS</pre>
</blockquote>
<p>
The <tt>EncodedFS</tt> is a synonym for <tt>FastString</tt> indicating
that the string is Z-encoded. (Details in <tt>OccName.lhs</tt>.)
Z-encoding encodes funny characters like '%' and '$' into alphabetic
characters, like "zp" and "zd", so that they can be used in object-file
symbol tables without confusing linkers and suchlike.
</p>
<p>
The name spaces are:
</p>
<ul>
<li> <tt>VarName</tt>: ordinary variables</li>
<li> <tt>TvName</tt>: type variables</li>
<li> <tt>DataName</tt>: data constructors</li>
<li> <tt>TcClsName</tt>: type constructors and classes (in Haskell they
share a name space) </li>
</ul>
<small>
<!-- hhmts start -->
Last modified: Tue Nov 13 14:11:35 EST 2001
Last modified: Wed May 4 14:57:55 EST 2005
<!-- hhmts end -->
</small>
</body>
......
......@@ -2,70 +2,248 @@
<html>
<head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
<title>The GHC Commentary - The Real Story about Variables, Ids, TyVars, and the like</title>
<title>The GHC Commentary - The Glorious Renamer</title>
</head>
<body BGCOLOR="FFFFFF">
<h1>The GHC Commentary - The Glorious Renamer</h1>
<p>
The <em>renamer</em> sits between the parser and the typechecker.
However, its operation is quite tightly interwoven with the
typechecker. This is partially due to support for Template Haskell,
where spliced code has to be renamed and type checked. In particular,
top-level splices lead to multiple rounds of renaming and type
checking.
</p>
<p>
The main externally used functions of the renamer are provided by the
module <code>rename/RnSource.lhs</code>. In particular, we have
</p>
<blockquote>
<pre>
rnSrcDecls :: HsGroup RdrName -> RnM (TcGblEnv, HsGroup Name)
rnTyClDecls :: [LTyClDecl RdrName] -> RnM [LTyClDecl Name]
rnSplice :: HsSplice RdrName -> RnM (HsSplice Name, FreeVars)</pre>
</blockquote>
<p>
All of which execute in the renamer monad <code>RnM</code>. The first
function, <code>rnSrcDecls</code> renames a binding group; the second,
<code>rnTyClDecls</code> renames a list of (toplevel) type and class
declarations; and the third, <code>rnSplice</code> renames a Template
Haskell splice. As the types indicate, the main task of the renamer is
to convert converts all the <tt>RdrNames</tt> to <a
href="names.html"><tt>Names</tt></a>, which includes a number of
well-formedness checks (no duplicate declarations, all names are in
scope, and so on). In addition, the renamer performs other, not
strictly name-related, well-formedness checks, which includes checking
that the appropriate flags have been supplied whenever language
extensions are used in the source.
</p>
<h2>RdrNames</h2>
<p>
A <tt>RdrName.RdrName</tt> is pretty much just a string (for an
unqualified name like "<tt>f</tt>") or a pair of strings (for a
qualified name like "<tt>M.f</tt>"):
</p>
<blockquote>
<pre>
data RdrName
= Unqual OccName
-- Used for ordinary, unqualified occurrences
(This section is, like most of the Commentary, rather incomplete.)
<p>
The <em>renamer</em> sits between the parser and the typechecker.
Roughly speaking, It has the type:
<pre>
HsModule RdrName -> HsModule Name
</pre>
That is, it converts all the <tt>RdrNames</tt> to <a href="names.html"><tt>Names</tt></a>.
| Qual Module OccName
-- A qualified name written by the user in
-- *source* code. The module isn't necessarily
-- the module where the thing is defined;
-- just the one from which it is imported
<h2> RdrNames </h2>
| Orig Module OccName
-- An original name; the module is the *defining* module.
-- This is used when GHC generates code that will be fed
-- into the renamer (e.g. from deriving clauses), but where
-- we want to say "Use Prelude.map dammit".
| Exact Name
-- We know exactly the Name. This is used
-- (a) when the parser parses built-in syntax like "[]"
-- and "(,)", but wants a RdrName from it
-- (b) when converting names to the RdrNames in IfaceTypes
-- Here an Exact RdrName always contains an External Name
-- (Internal Names are converted to simple Unquals)
-- (c) by Template Haskell, when TH has generated a unique name</pre>
</blockquote>
<p>
The OccName type is described in <a href="names.html#occname">The
truth about names</a>.
</p>
A <tt>RdrNames</tt> is pretty much just a string (for an unqualified name
like "<tt>f</tt>") or a pair of strings (for a qualified name like "<tt>M.f</tt>"):
<pre>
data RdrName = RdrName Qual OccName
data Qual = Unqual
| Qual ModuleName -- A qualified name written by the user in source code
-- The module isn't necessarily the module where
-- the thing is defined; just the one from which it
-- is imported
| Orig ModuleName -- This is an *original* name; the module is the place
-- where the thing was defined
</pre>
The OccName type is described in <a href="names.html#occname">"The truth about names"</a>.
<p>
The <tt>OrigName</tt> variant is used internally; it allows GHC to speak of <tt>RdrNames</tt>
that refer to the original name of the thing.
<h2>The Renamer Monad</h2>
<p>
Due to the tight integration of the renamer with the typechecker, both
use the same monad in recent versions of GHC. So, we have
</p>
<blockquote>
<pre>
type RnM a = TcRn a -- Historical
type TcM a = TcRn a -- Historical</pre>
</blockquote>
<p>
with the combined monad defined as
</p>
<blockquote>
<pre>
type TcRn a = TcRnIf TcGblEnv TcLclEnv a
type TcRnIf a b c = IOEnv (Env a b) c
data Env gbl lcl -- Changes as we move into an expression
= Env {
env_top :: HscEnv, -- Top-level stuff that never changes
-- Includes all info about imported things
<h2> Rebindable syntax </h2>
env_us :: TcRef UniqSupply, -- Unique supply for local varibles
In Haskell when one writes "3" one gets "fromInteger 3", where
"fromInteger" comes from the Prelude (regardless of whether the
Prelude is in scope). If you want to completely redefine numbers,
that becomes inconvenient. So GHC lets you say
"-fno-implicit-prelude"; in that case, the "fromInteger" comes from
whatever is in scope. (This is documented in the User Guide.)
<p>
This feature is implemented as follows (I always forget).
<ul>
<li> Four HsSyn constructs (NegApp, NPlusKPat, HsIntegral, HsFractional)
contain a <tt>Name</tt> (i.e. it is not parameterised).
<li> When the parser builds these constructs, it puts in the built-in Prelude
Name (e.g. PrelNum.fromInteger).
<li> When the renamer encounters these constructs, it calls <tt>RnEnv.lookupSyntaxName</tt>.
This checks for <tt>-fno-implicit-prelude</tt>; if not, it just returns the same Name;
otherwise it takes the occurrence name of the Name, turns it into an unqualified RdrName, and looks
it up in the environment. The returned name is plugged back into the construct.
<li> The typechecker uses the Name to generate the appropriate typing constraints.
</ul>
env_gbl :: gbl, -- Info about things defined at the top level
-- of the module being compiled
env_lcl :: lcl -- Nested stuff; changes as we go into
-- an expression
}</pre>
</blockquote>
<p>
the details of the global environment type <code>TcGblEnv</code> and
local environment type <code>TcLclEnv</code> are also defined in the
module <code>typecheck/TcRnTypes.lhs</code>. The monad
<code>IOEnv</code> is defined in <code>utils/IOEnv.hs</code> and extends
the vanilla <code>IO</code> monad with an additional state parameter
<code>env</code> that is treated as in a reader monad. (Side effecting
operations, such as updating the unique supply, are done with
<code>TcRef</code>s, which are simply a synonym for <code>IORef</code>s.)
</p>
<h2>Name Space Management</h2>
<p>
As anticipated by the variants <code>Orig</code> and <code>Exact</code>
of <code>RdrName</code> some names should not change during renaming,
whereas others need to be turned into unique names. In this context,
the two functions <code>RnEnv.newTopSrcBinder</code> and
<code>RnEnv.newLocals</code> are important:
</p>
<blockquote>
<pre>
newTopSrcBinder :: Module -> Maybe Name -> Located RdrName -> RnM Name
newLocalsRn :: [Located RdrName] -> RnM [Name]</pre>
</blockquote>
<p>
The two functions introduces new toplevel and new local names,
respectively, where the first two arguments to
<code>newTopSrcBinder</code> determine the currently compiled module and
the parent construct of the newly defined name. Both functions create
new names only for <code>RdrName</code>s that are neither exact nor
original.
</p>
<h3>Introduction of Toplevel Names: Global RdrName Environment</h3>
<p>
A global <code>RdrName</code> environment
<code>RdrName.GlobalRdrEnv</code> is a map from <code>OccName</code>s to
lists of qualified names. More precisely, the latter are
<code>Name</code>s with an associated <code>Provenance</code>:
</p>
<blockquote>
<pre>
data Provenance
= LocalDef -- Defined locally
Module
| Imported -- Imported
[ImportSpec] -- INVARIANT: non-empty
Bool -- True iff the thing was named *explicitly*
-- in *any* of the import specs rather than being
-- imported as part of a group;
-- e.g.
-- import B
-- import C( T(..) )
-- Here, everything imported by B, and the constructors of T
-- are not named explicitly; only T is named explicitly.
-- This info is used when warning of unused names.</pre>
</blockquote>
<p>
The part of the global <code>RdrName</code> environment for a module
that contains the local definitions is created by the function
<code>RnNames.importsFromLocalDecls</code>, which also computes a data
structure recording all imported declarations in the form of a value of
type <code>TcRnTypes.ImportAvails</code>.
</p>
<p>
The function <code>importsFromLocalDecls</code>, in turn, makes use of
<code>RnNames.getLocalDeclBinders :: Module -> HsGroup RdrName -> RnM
[AvailInfo]</code> to extract all declared names from a binding group,
where <code>HscTypes.AvailInfo</code> is essentially a collection of
<code>Name</code>s; i.e., <code>getLocalDeclBinders</code>, on the fly,
generates <code>Name</code>s from the <code>RdrName</code>s of all
top-level binders of the module represented by the <code>HsGroup
RdrName</code> argument.
</p>
<p>
It is important to note that all this happens before the renamer
actually descends into the toplevel bindings of a module. In other
words, before <code>TcRnDriver.rnTopSrcDecls</code> performs the
renaming of a module by way of <code>RnSource.rnSrcDecls</code>, it uses
<code>importsFromLocalDecls</code> to set up the global
<code>RdrName</code> environment, which contains <code>Name</code>s for
all imported <em>and</em> all locally defined toplevel binders. Hence,
when the helpers of <code>rnSrcDecls</code> come across the
<em>defining</em> occurences of a toplevel <code>RdrName</code>, they
don't rename it by generating a new name, but they simply look up its
name in the global <code>RdrName</code> environment.
</p>
<h2>Rebindable syntax</h2>
<p>
In Haskell when one writes "3" one gets "fromInteger 3", where
"fromInteger" comes from the Prelude (regardless of whether the
Prelude is in scope). If you want to completely redefine numbers,
that becomes inconvenient. So GHC lets you say
"-fno-implicit-prelude"; in that case, the "fromInteger" comes from
whatever is in scope. (This is documented in the User Guide.)
</p>
<p>
This feature is implemented as follows (I always forget).
<ul>
<li>Names that are implicitly bound by the Prelude, are marked by the
type <code>HsExpr.SyntaxExpr</code>. Moreover, the association list
<code>HsExpr.SyntaxTable</code> is set up by the renamer to map
rebindable names to the value they are bound to.
</li>
<li>Currently, five constructs related to numerals
(<code>HsExpr.NegApp</code>, <code>HsPat.NPat</code>,
<code>HsPat.NPlusKPat</code>, <code>HsLit.HsIntegral</code>, and
<code>HsLit.HsFractional</code>) and
two constructs related to code>do</code> expressions
(<code>HsExpr.BindStmt</code> and
<code>HsExpr.ExprStmt</code>) have rebindable syntax.
</li>
<li> When the parser builds these constructs, it puts in the
built-in Prelude Name (e.g. PrelNum.fromInteger).
</li>
<li> When the renamer encounters these constructs, it calls
<tt>RnEnv.lookupSyntaxName</tt>.
This checks for <tt>-fno-implicit-prelude</tt>; if not, it just
returns the same Name; otherwise it takes the occurrence name of the
Name, turns it into an unqualified RdrName, and looks it up in the
environment. The returned name is plugged back into the construct.
</li>
<li> The typechecker uses the Name to generate the appropriate typing
constraints.
</li>
</ul>
<p><small>
<!-- hhmts start -->
Last modified: Tue Nov 13 14:11:35 EST 2001
Last modified: Wed May 4 17:16:15 EST 2005
<!-- hhmts end -->
</small>
</body>
</html>
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment