Commit e5063a04 authored by simonmar's avatar simonmar

[project @ 2002-07-31 14:24:18 by simonmar]

Revamp the testsuite framework.  The previous framework was an
experiment that got a little out of control - a whole new language
with an interpreter written in Haskell was rather heavyweight and left
us with a maintenance problem.

So the new test driver is written in Python.  The downside is that you
need Python to run the testsuite, but we don't think that's too big a
problem since it only affects developers and Python installs pretty
easily onto everything these days.

Highlights:

  - 790 lines of Python, vs. 5300 lines of Haskell + 720 lines of
    <strange made-up language>.

  - the framework supports running tests in various "ways", which should
    catch more bugs.  By default, each test is run in three ways:
    normal, -O, and -O -fasm.  Additionally, if profiling libraries
    have been built, another way (-O -prof -auto-all) is added.  I plan
    to also add a 'GHCi' way.

    Running tests multiple ways has already shown up some new bugs!

  - documentation is in the README file and is somewhat improved.

  - the framework is rather less GHC-specific, and could without much
    difficulty be coaxed into using other compilers.  Most of the
    GHC-specificness is in a separate configuration file (config/ghc).

Things may need a while to settle down.  Expect some unexpected
failures.
parent 0d4aee25
Running the test suite against a GHC build
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To use the test suite to test a GHC build, first say 'make' in
fptools/testsuite. The build system will build the test driver. To
run the tests, cd into tests/ghc-regress and say 'make' to run all the
tests against the GHC build in the same source tree. You'll get a
summary indicating how many tests succeeded and failed (including how
many of the failures were 'expected'), and a list of which tests
failed. To investigate the failures in more detail, see "Running
individual tests" below.
NOTE: you need Python (any version >= 1.5 will probably do) in order
to use the testsuite.
To run the test suite against a GHC build in the same source tree:
cd fptools/testsuite/tests/ghc-regress
make
To run the test suite against a different GHC, say ghc-5.04:
cd fptools/testsuite/tests/ghc-regress
make TEST_HC=ghc-5.04
To run an individual test or tests (eg. tc054):
cd fptools/testsuite/tests/ghc-regress
make TEST=tc054
(you can also go straight to the directory containing the test and say
'make TEST=tc054' from there, which will save some time).
For more details, see below.
Running the testsuite with a compiler other than GHC
......@@ -30,17 +44,16 @@ Running individual tests or subdirectories of the testsuite
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Most of the subdirectories in the testsuite have a Makefile. In these
subdirectories you can use 'make' to run the test driver in several
subdirectories you can use 'make' to run the test driver in two
ways:
make -- run all the tests in the current directory
make verbose -- as make test, but up the verbosity
make accept -- run the tests, accepting the current output
The following variables may be set on the make command line, to adjust
various
The following variables may be set on the make command line:
TESTS -- specific tests to run
TEST_HC -- compiler to use
EXTRA_HC_OPTS -- extra flags to send to the Haskell compiler
EXTRA_RUNTEST_OPTS -- extra flags to give the test driver
CONFIG -- use a different configuration file
......@@ -57,79 +70,263 @@ new output like so:
where <test-name> is the name of the test. In a directory which
contains a single test, or if you want to update *all* the tests in
the current directory, just omit the 'TESTS=' part.
the current directory, just omit the 'TESTS=<test-name>' part.
Adding a new test
~~~~~~~~~~~~~~~~~
Almost all tests have a single source file and fall into one of the
categories should_fail (to compile), should_compile, or should_run.
For a test which can be encapsulated in a single source file, follow
these steps:
1. Find the appropriate place for the test. The GHC regression suite
is generally organised in a "white-box" manner: a regression which
originally illustrated a bug in a particular part of the compiler
is placed in the directory for that part. For example, typechecker
regression tests go in the typechecker/ directory, parser tests
go in parser/, and so on.
It's not always possible to find a single best place for a test;
in those cases just pick one which seems reasonable.
Under each main directory may be up to three subdirectories:
- should_compile: tests which need to compile only
- should_fail: tests which should fail to compile
and generate a particular error message
Stuff to do:
- should_run: tests which should compile, run with
some specific input, and generate a
particular output.
We don't always divide the tests up like this, and it's not
essential to do so (the directory names have no meaning as
far as the test driver is concerned).
1. Place the source and any expected outputs for the test in
ghc-regress/appropriateDirectory/should_<whatever>
2. Having found a suitable place for the test, give the test a name.
Follow the convention for the directory in which you place the
test: for example, in typecheck/should_compile, tests are named
tc001, tc002, and so on. Suppose you name your test T, then
you'll have the following files:
T.hs The source file containing the test
T.stdin (for tests that run, and optional)
A file to feed the test as standard input when it
runs.
T.stdout (for tests that run, and optional)
For tests that run, this file is compared against
the standard output generated by the program. If
T.stdout does not exist, then the program must not
generate anything on stdout.
T.stderr (optional) For tests that run, this file is compared
against the standard error generated by the program.
For tests that compile only, this file is compared
against the standard error output of the compiler,
which is normalised to eliminate bogus differences
(eg. absolute pathnames are removed, whitespace
differences are ignored, etc.)
Place a should_xyz test in the should_xyz directory; putting it
elsewhere is possible, but confusing.
2. Edit all.T in the relevant directory and add a line for the
test. How to do this should be obvious. For the record, the
user-level functions are:
test. The line is always of the form
test(<name>, <opt-fn>, <test-fn>, <args>)
where
<name> is the name of the test, in quotes (' or ").
<opt-fn> is a function (i.e. any callable object in Python)
which allows the options for this test to be changed.
There are several pre-defined functions which can be
used in this field:
normal don't change any options from the defaults
expect_fail this test is an expected failure
skip skip this test
omit_ways(ways) skip this test for certain ways
set_stdin(file) use a different file for stdin
exit_code(n) expect an exit code of 'n' from the prog
extra_run_opts(opts) pass some extra opts to the prog
you can compose two of these functions together by
saying compose(f,g). For example, to expect an exit
code of 3 and omit way 'opt', we could use
compose(omit_ways(['opt']), exit_code(3))
as the <opt-fn> argument. Calls to compose() can of
course be nested.
<test-fn> is a function which describes how the test should be
run, and determines the form of <args>. The possible
values are:
compile Just compile the program, the
compilation should succeed.
compile_fail Just compile the program, the
compilation should fail (error
messages will be in T.stderr).
compile_and_run
Compile the program and run it,
comparing the output against the
relevant files.
multimod_compile
Compile a multi-module program
(more about multi-module programs
below).
multimod_compile_and_run
Compile and run a multi-module
program.
<args> is a list of arguments to be passed to <test-fn>.
For compile, compile_fail and compile_and_run, <args>
is a list with a single string which contains extra
compiler options with which to run the test. eg.
test(tc001, normal, compile, ['-fglasgow-exts'])
would pass the flag -fglasgow-exts to the compiler
when compiling tc001.
The multimod_ versions of compile and compile_and_run
expect an extra argument on the front of the list: the
name of the top module in the program to be compiled
(usually this will be 'Main').
A multi-module test is straightforward. It must go in a directory of
its own, and the source files can be named anything you like. The
test must have a name, in the same way as a single-module test; and
the stdin/stdout/stderr files follow the name of the test as before.
In the same directory, place a file 'test.T' containing a line like
test(multimod001, normal, multimod_compile_and_run, \
[ 'Main', '-fglasgow-exts', '', 0 ])
as described above.
For some examples, take a look in tests/ghc-regress/programs.
The details
~~~~~~~~~~~
The test suite driver is just a set of Python scripts, as are all of
the .T files in the test suite. The driver (driver/runtests.py) first
searches for all the .T files it can find, and then proceeds to
execute each one, keeping a track of the number of tests run, and
which ones succeeded and failed.
The script runtests.py takes several options:
--config <file>
<file> is just a file containing Python code which is
executed. The purpose of this option is so that a file
containing settings for the configuration options can
be specified on the command line. Multiple --config options
may be given.
--rootdir <dir>
<dir> is the directory below which to search for .T files
to run.
--output-summary <file>
-- Run a should_run vanilla (single-source-file) test.
You can specify extra compile and run flags, and if the
run is expected to have a non-zero exit code, that too.
The run's .stderr must match the specified .stderr if
it exists. The run's .stdout must match at least one of
the specified .stdouts, if they exist.
In addition to dumping the test summary to stdout, also
put it in <file>. (stdout also gets a lot of other output
when running a series of tests, so redirecting it isn't
always the right thing).
vt ( extra_compile_args,
extra_run_args,
expected_nonzero_run_result )
--only <test>
Only run tests named <test> (multiple --only options can
be given). Useful for running a single test from a .T file
containing multiple tests.
-- Run a vanilla should_compile test. To pass, the compiler must
exit with 0, and the normalised .stderr, if it exists, must
match the normalised specified .stderr.
-e <stmt>
vtc ( extra_compile_args )
executes the Python statement <stmt> before running any tests.
The main purpose of this option is to allow certain
configuration options to be tweaked from the command line; for
example, the build system adds '-e config.accept=1' to the
command line when 'make accept' is invoked.
-- Run a vanilla should_fail test. To pass, the compiler must
exit with non-zero, and the normalised .stderr, which must exist,
must match the normalised specified .stderr.
Most of the code for running tests is located in driver/testlib.py.
Take a look.
vtf ( extra_compile_args )
There is a single Python class (TestConfig) containing the global
configuration for the test suite. It contains information such as the
kind of compiler being used, which flags to give it, which platform
we're running on, and so on. The idea is that each platform and
compiler would have its own file containing assignments for elements
of the configuration, which are sourced by passing the appropriate
--config options to the test driver. For example, the GHC
configuration is contained in the file config/ghc.
A .T file can obviously contain arbitrary Python code, but the general
idea is that it contains a sequence of calls to the function test(),
which resides in testlib.py. As described above, test() takes four
arguments:
LIMITATIONS
test(<name>, <opt-fn>, <test-fn>, <args>)
1. Only single-source-file tests are properly supported at the mo.
To be fixed.
The function <opt-fn> is allowed to be any Python callable object,
which takes a single argument of type TestOptions. TestOptions is a
class containing options which affect the way that the current test is
run: whether to skip it, whether to expect failure, extra options to
pass to the compiler, etc. (see testlib.py for the definition of the
TestOptions class). The idea is that the <opt-fn> function modifies
the TestOptions object that it is passed. For example, to expect
failure for a test, we might do this in the .T file:
2. All compilations are passed -no-recomp -dcore-lint, so there's
no point in adding them.
def fn(opts):
opts.expect = 'fail'
3. If you want to pass a flag to a whole bunch of tests, write a
small wrapper fn to do this. See vtc vs myvtc in
test(test001, fn, compile, [''])
ghc-regress/typecheck/should_compile/all.T.
so when fn is called, it sets the instance variable "expect" in the
instance of TestOptions passed as an argument, to the value 'fail'.
This indicates to the test driver that the current test is expected to
fail.
4. Current mis-/un-handled tests are documented in ghc-regress/NOTES.
Some of these functions, such as the one above, are common, so rather
than forcing every .T file to redefine them, we provide canned
versions. For example, the provided function expect_fail does the
same as fn in the example above. See testlib.py for all the canned
functions we provide for <opt-fn>.
The argument <test-fn> is a function which performs the test. It
takes three or more arguments:
<test-fn>( <name>, <way>, ... )
ERROR MESSAGE NORMALISATION
where <name> is the name of the test, <way> is the way in which it is
to be run (eg. opt, optasm, prof, etc.), and the rest of the arguments
are constructed from the list <args> in the original call to test().
The following <test-fn>s are provided at the moment:
Is done to reduce spurious failures due to changes in capitalisation
and whitespaces. Expected and actual error messages are normalised
prior to comparison. What it does:
compile
compile_fail
compile_and_run
multimod_compile
multimod_compile_and_run
and obviously others can be defined. The function should return
either 'pass' or 'fail' indicating that the test passed or failed
respectively.
-- Remove all whitespace lines
-- Merge all other whitespace into a single space.
-- Make all lowercase.
-- Look for file names and zap the directory part:
foo/var/xyzzy/somefile.ext --> somefile.ext
# Testsuite configuration setup for GHC
#
# This file is Python source
#
config.compiler_type = 'ghc'
config.compiler = 'ghc'
config.compiler_always_flags = ['-no-recomp', '-dcore-lint']
config.compile_ways = ['normal', 'opt', 'optasm']
config.run_ways = ['normal', 'opt', 'optasm']
config.way_flags = { 'normal' : [],
'opt' : ['-O'],
'optasm' : ['-O -fasm'],
'prof' : ['-O -prof -auto-all'],
'unreg' : ['-unreg']
}
-----------------------------------------------------------------------
--- Stuff to do with multiple-source-file tests. We assume ---
--- that the name of the test is to be used as the basename ---
--- for everything. ---
-----------------------------------------------------------------------
-- global variables:
$stdin = ""
$expect = "pass"
$normalise_errmsg = False
$normalise_output = False
---------------------------------------------------------------
--- UTILITY FNs ---
---------------------------------------------------------------
include ($confdir ++ "/" ++ $conffilename)
include ($confdir ++ "/../std-macros.T")
-- (eg) "fooble" --> "testdir/fooble"
def testdirify ( $basename )
{
return $testdir ++ "/" ++ $basename
}
---------------------------------------------------------------
--- COMPILATION ---
---------------------------------------------------------------
-- Clean up prior to the test, so that we can't spuriously conclude
-- that it passed on the basis of old run outputs.
def pretest_cleanup()
{
rm_nofail(qualify("comp.stderr"))
rm_nofail(qualify("run.stderr"))
rm_nofail(qualify("run.stdout"))
-- simple_build_Main zaps the following:
-- objects
-- executable
-- not interested in the return code
}
-- Guess flags suitable for the compiler.
def guess_compiler_flags()
{
if $tool contains "ghc"
then
return "-no-recomp --make -dcore-lint" ++
" -i" ++ $testdir
else
-- Problem here is that nhc and hbc don't understand --make,
-- and we rely on it.
-- if $tool contains "nhc"
-- then
-- return "-an-nhc-specific-flag"
-- else
-- if $tool contains "hbc"
-- then
-- return ""
-- else
framefail ("Can't guess what kind of Haskell compiler " ++
"you're testing: $tool = " ++ $tool)
-- fi
-- fi
fi
}
-- Build Main, and return the compiler result code. Compilation
-- output goes into testname.comp.stderr. Source is assumed to
-- be in Main.hs or Main.lhs, and modules reachable from it.
def simple_build_prog_WRK ( $_main, $_extra_args )
{
$flags = guess_compiler_flags()
$errname = qualify("comp.stderr")
$exename = qualify("") -- ie, the exe name == the test name
rm_or_fail($errname)
rm_or_fail($exename)
rm_nofail(testdirify("*.o"))
$cmd = "\"" ++ $tool ++ "\" " ++ $flags ++ " " ++ $_extra_args ++ " "
++ (if defined $extra_hc_flags
then $extra_hc_flags
else "")
++ " -o " ++ $exename ++ " "
++ $_main ++ " >" ++ $errname ++ " 2>&1"
$res = run $cmd
return $res
}
---------------------------------------------------------------
--- CONDUCTING A COMPLETE TEST ---
---------------------------------------------------------------
-- Compile and run (should_run) style test
def multimod-compile( $mod, $extra_compile_args )
{
pretest_cleanup()
$main = if $mod == "" then "Main" else $mod
$res = simple_build_prog_WRK( $main, $extra_compile_args )
if $res /= "0" then
say_fail_because_compiler_barfd ( $res )
return False
else
return True
fi
}
def multimod-run-test ( $extra_run_args,
$allowable_nonzero_exit_code )
{
$exit_code =
if $allowable_nonzero_exit_code /= "" then
$allowable_nonzero_exit_code
else "0"
return simple_run_pgm( $extra_run_args, $exit_code )
}
---------------------------------------------------------------
--- TOP-LEVEL FNS ---
---------------------------------------------------------------
--------------------------------------------------------------
-- top-level
-- Compile and run (should_run) style test
def mtc ( $mod, $extra_compile_args )
{
$test_passed = multimod-compile( $mod, $extra_compile_args )
if ($expect == "pass") then
expect pass
else
expect fail
fi
pass when $test_passed
fail when otherwise
}
def mtr ( $extra_compile_args,
$extra_run_args,
$allowable_nonzero_exit_code )
{
$test_passed
= multimod-compile( "Main", $extra_compile_args )
&& multimod-run-test( $extra_run_args,
$allowable_nonzero_exit_code )
if ($expect == "pass") then
expect pass
else
expect fail
fi
pass when $test_passed
fail when otherwise
}
-----------------------------------------------------------------------
--- end multimod-test.T ---
-----------------------------------------------------------------------
include ($confdir ++ "/../singlefile-macros.T")
expect pass
pretest_cleanup()
$res = simple_compile_Main()
pass when contents("comp.stdout") == ""
fail when otherwise
include ($confdir ++ "/../singlefile-macros.T")
expect pass
pretest_cleanup()
$res = simple_compile_Main()
pass when
$tool contains "ghc"
&& contents("comp.stdout") contains "Could not deduce"
-- put a pass clause here for NHC
fail when otherwise
include ($confdir ++ "/../singlefile-macros.T")
expect pass
pretest_cleanup()
simple_build_Main()
$res = simple_run_main_no_stdin()
pass when contents("run.stdout") == "True\n"
fail when otherwise
\ No newline at end of file
$diff = "diff -C 2"
$rm = "rm -f"
$cp = "cp"
-- -----------------------------------------------------------------------------
-- generic useful stuff