Adding new test cases
For adding any test case, follow these guide lines and then refer to the more specific examples below for a single module test case and a multiple module test case. All test cases should reside under the testsuite/tests/
directory. From now on we assume that directory as our root.
-
Find the appropriate place for the test case. The GHC regression suite is generally organised in a "white-box" manner: a regression which originally illustrated a bug in a particular part of the compiler is placed in the directory for that part. For example, typechecker regression test cases go in the typechecker/ directory, parser test cases go in parser/, and so on.
It's not always possible to find a single best place for a test case; in those cases just pick one which seems reasonable.
Under each main directory there are usually up to three subdirectories:
- should_compile: test cases which need to compile only
- should_fail: test cases which should fail to compile and generate a particular error message
- should_run: test cases which should compile, run with some specific input, and generate a particular output.
We don't always divide the test cases up like this, and it's not essential to do so. The directory names have no meaning as far as the test driver is concerned, it is simply a convention.
-
Having found a suitable place for the test case, give the test case a name. For regression test cases, we often just name the test case after the bug number (e.g. T2047). Alternatively, follow the convention for the directory in which you place the test case: for example, in typecheck/should_compile, test cases are named tc001, tc002, and so on. Suppose you name your test case T, then you'll have the following files:
-
T.hs
The source file(s) containing the test case. Details on how to handle single Vs multi source test cases are explained below.
If your test depends on source files that don't start with name of the test, you have to specify them using the
extra_files
setup function (see below). -
T.stdin
(for test cases that run, and optional)A file to feed the test case as standard input when it runs.
-
T.stdout
(for test cases that run, and optional)For test cases that run, this file is compared against the standard output generated by the program. If T.stdout does not exist, then the program must not generate anything on stdout.
-
T.stderr
(optional)For test cases that run, this file is compared against the standard error generated by the program.
For test cases that compile only, this file is compared against the standard error output of the compiler, which is normalised to eliminate bogus differences (e.g. absolute pathnames are removed, whitespace differences are ignored, etc.)
-
T.ghc.stderr
The standard error output of the compiler with warn_and_run test function.
-
-
Edit all.T in the relevant directory and add a line for the test case. The line is always of the form
test(<name>,<setup>,<test-fn>,<args...>)
The format of this line is explained in more detail below as it differs for test case types. It allows you to say if the test case should fail to compile, run fine, run but terminate with a certain exit code... ect. The <args...> argument is a list argument, where the length and format of the list depends on the <test-fn> you use. The choice of <test-fn> is largely dependent on how complex it is to build your test case. The <test-fn> specifies a build method more then anything else.
Note also that the all.T file is simply a python source file that gets executed by the test framework. Hence any Python code in it is valid.
Below we will look at some of the more common test case setups.
1. A single module test case
A single module test case is very easy. Simply name the Haskell source files the same as your test name (so T.hs in our running example).
Then for a test case that should compile and run fine we would put this line in all.T:
test('cgrun001', normal, compile_and_run, [''])
For a test case that should compile but you don't want run, we would put this line in all.T:
test('cg002', normal, compile, [''])
For a test case that should fail during compilation we would put this line in all.T:
test('drvfail001', normal, compile_fail, [''])
For more detailed control of a test case, see below. \REF
2. A multiple module test case
A multiple module test case is slightly more complex then a single module one. Firstly we a concerned with how to handle the simplest form of a multiple module test case, that is one where the whole test case can be built in one go using the --make
command of GHC. If you have more complex needs (like compiling source files that --make
can't handle, and/or need to compile different modules with different GHC arguments, then see below)
Then for a test case that should compile and run fine we would put this line in all.T:
test(multimod001, normal, multimod_compile_and_run, \
['Main', '-fglasgow-exts', ''])
This example would compile a multiple module test case where the top module is Main.hs and -fglasgow-exts
is passed to GHC when compiling. It is important that the top module is Main.hs, while the name of the actual file can be anything we want.
For a test case that should compile but you don't want run, we would put this line in all.T:
test('T3286',[], multimod_compile,['T3286','-v0'])
This example would compile a multiple module test case where the top module is T3286 and before compiling the test the files T3286b.o and T3286b.hi are removed.
For a test case that should fail during compilation we would put this line in all.T:
test('Over',
[],
multimod_compile_fail,
['OverD', '-no-hs-main -c -v0'])
3. Advanced multiple module test case
If you have a test case that can't be built with the simpler two methods described above then you should try the method described below. The build method below allows you to explicitly provide a list of (source file, GHC flags)
tuples. GHC then builds those in the order you specify. This is useful for test cases say that use a .cmm source file or .c source file, these are files that GHC can build but aren't picked up by --make
.
Then for a test case that should compile and run fine we would put this line in all.T:
test('cgrun069', omit_ways(['ghci']), multi_compile_and_run,
['cgrun069', [('cgrun069_cmm.cmm', '')], ''])
This test case relies on a .cmm file, hence it can't use the simpler multimod_compile_and_run
<test-fn>. We also see here how we can stop a test case running in a certain WAY.
For a test case that should compile but you don't want run, we would put this line in all.T:
test('Check02', normal, multi_compile, ['Check02', [
('Check02_A.hs', ''),
('Check02_B.hs', '')
], '-trust base'])
For a test case that should fail during compilation we would put this line in all.T:
test('Check01', normal, multi_compile_fail, ['Check01', [
('Check01_A.hs', ''),
('Check01_B.hs', '-trust base')
], ''])
This test case must use the multi_compile_fail
method as it relies on being able to compile the file Check01_B.hs with the argument '-trust base' but not compile any of the other files with this flag.
4. Format of the test entries in all.T
Each test in a test.T
file is specified by a line the form
test(<name>, <setup>, <test-fn>, <args...>)
Where <args...> is a list of arguments.
4.1 The <name> field
<name> is the name of the test, in quotes (' or ").
4.2 The <setup> field
<setup> is a function (i.e. any callable object in Python) which allows the options for this test to be changed. There are many pre-defined functions which can be used in this field:
-
normal don't change any options from the defaults
-
skip skip this test
-
omit_ways(ways) skip this test for certain ways
-
only_ways(ways) do this test certain ways only
-
extra_ways(ways) add some ways which would normally be disabled
-
expect_broken(bug) this test is a expected not to work due to the indicated issue number
-
expect_broken_for(bug, ways) as expect_broken, but only for the indicated ways
-
set_stdin(file) use a different file for stdin
-
no_stdin use no stdin at all (otherwise use
/dev/null
) -
copy_files copy the test files to the temporary directory rather than using symlinks, this is useful when the test script modifies a source file
-
exit_code(n) expect an exit code of 'n' from the prog
-
extra_run_opts(opts) pass some extra opts to the prog
-
extra_files(files) extra files that the test depends on. By default the testsuite driver assumes tests only depend on files that start with the name of the test
i.e. (<testname>*)
. For the time being,extra_files
can also be specified in the filetestsuite/driver/extra_files.py
.
test('prog013', extra_files(['Bad.hs','Good.hs']), ghci_script,['prog013.script'])`
-
req_profiling requires profiling
-
req_interp require the interpreter (i.e. one of ghci, annotations, TH, etc)
-
req_hadrian(["target0","target1"]) requires hadrian targets "target0" and "target1"
-
ignore_stdout don't try to compare stdout output
-
ignore_stderr don't try to compare stderr output
-
normalise_errmsg_fun(f) pass the stderr through
f
before comparing -
grep_errmsg(needle) compare only stderr lines that contain
needle
-
check_errmsg(needle) compare stderr only on whether it contains
needle
or not -
compile_timeout_multiplier(n) and run_timeout_multiplier(n) modify the default timeout (usually 300s, displayed at the beginning of the testsuite) by a given factor for either the compile or the run part of your test. Note that the timeout program returns with exit code 99 when it kills your test. So if you want a timeout to mean success instead of failure, add exit_code(99) as a setup function.
-
high_memory_usage this test uses a lot of memory (allows the testsuite driver to be intelligent about what it runs in parallel)
-
literate look for a
.lhs
file instead of a.hs
file -
c_src look for a
.c
file -
objc_src look for a
.m
file -
objcpp_src look for a
.mm
file -
pre_cmd(string) run this command before running the test (this is preferred over the following 3 where it is possible to use it)
-
cmd_prefix(string) prefix this string to the execution command when run
-
cmd_wrapper(f) applies
f
to the execution command and runs the result instead -
normalise_slashes convert backslashes to forward slashes before comparing the output
-
when(predicate, f) Do
f
, but only ifpredicate
isTrue
-
unless(predicate, f) Do
f
, but only ifpredicate
isFalse
There are a number of predicates which can be used:
-
doing_ghci() GHCi is available
-
ghc_dynamic() GHC is compiled with
-dynamic
(usually viaDYNAMIC_GHC_PROGRAMS=YES
) -
fast() the testsuite is running in "fast" mode
-
platform(plat) the testsuite is running on platform
plat
(which could be'x86_64-unknown-mingw32'
etc) -
opsys(os) the testsuite is running on operating system
os
(which could be'mingw32'
,'darwin'
,'linux'
, etc.) -
arch(a) the testsuite is running on architecture
a
(which could be'x86_64'
etc) -
wordsize(w) the testsuite is running on a platform with word size
w
bits (which could be32
or64
) -
msys() the testsuite is running on msys
-
cygwin() the testsuite is running on cygwin
-
have_vanilla() the compiler has built vanilla libraries
-
have_dynamic() the compiler has built dynamic libraries
-
have_profiling() the compiler has built profiling libraries
-
in_tree_compiler() the compiler being tested is in a source tree, as opposed to installed
-
compiler_type(ct) a compiler of type
ct
(which could be'ghc'
,'hug'
, etc) is being tested -
compiler_lt(ct, v) compiler type is
ct
, and the version is less thanv
-
compiler_le(ct, v) compiler type is
ct
, and the version is less than or equal tov
-
compiler_gt(ct, v) compiler type is
ct
, and the version is greater thanv
-
compiler_ge(ct, v) compiler type is
ct
, and the version is greater than or equal tov
-
unregisterised() the compiler is unregisterised
-
compiler_profiled() the compiler is build with a profiling RTS
-
compiler_debugged() the compiler is built with
-DDEBUG
-
tag(t) the compiler has tag
t
The following two setup functions should normally not be used; instead, use the expect_broken*
functions above so that the problem or unfinished feature doesn't get forgotten about.
-
expect_fail this test is an expected failure, i.e. the compiler, testdriver, OS or platform is missing a certain feature, and we don't plan to or can't fix it now or in the future. When used, it should usually be in combination with a specific OS or platform type (e.g.
when(opsys('mingw32'), expect_fail)
orwhen(platform('i386-unknown-mingw32'), expect_fail)
). Otherwise, mark it asexpect_broken
. - expect_fail_for(ways) expect failure for certain ways
There are a number of predefined lists of the ways meeting various criteria:
- prof_ways ways in which the program is built with profiling enabled
- threaded_ways ways in which the program is linked with the threaded runtime (or run in ghci)
- opt_ways ways in which the program is built with optimization enabled
- llvm_ways ways in which the program is built with the LLVM backend
In some cases you may want to re-use the same stdout file for multiple tests. You can accomplish this using the use_specs
function.
-
use_specs allows one to override files based on suffixes. e.g.
'stdout'
,'stderr'
,'asm'
,'prof.sample'
, etc. Exampleuse_specs({'stdout' : 'prof002.stdout'})
to make the test re-useprof002.stdout
.
To use more than one modifier on a test, just put them in a list. For example, to expect an exit code of 3 and omit way 'opt', we could use
[omit_ways(['opt']), exit_code(3) ]
as the <setup>
argument.
4.3 Performance tests
Performance tests have recently been revamped significantly and are now much easier to use.
In order to dignify a test as a performance test, it is sufficient to use the collect_stats()
function.
The collect_compiler_stats()
function and the collect_stats()
function are exactly equivalent with the
exception that the collect_compiler_stats()
function measures the performance of the compiler and the
collect_stats()
function measures the performance of the code generated by the compiler.
More documentation can be found in the driver/README.md
file, or the comments in driver/perf_notes.py
.
Here's an example test:
test('perf001',
[ collect_compiler_stats('bytes allocated',10) ],
compile, [''])
test('ticketNumber',
[ collect_stats() ],
compile_and_run, [''])
The first test is testing the performance of GHC itself, and requiring that the statistic 'bytes allocated' for the compiler when compiling the module perf001.hs
is +/- 10% of the value recorded in the previous commit. The second test is testing the performance of the program contained inside the ticketNumber test and requires that the metrics 'bytes allocated', 'peak_megabytes_allocated', and 'max_bytes_used' vary no more than +/- 20% each of the values recorded in the previous commit.
The collect_compiler_stats function takes 2 arguments:
v- defaults to 'all' v- defaults to 20%
collect_compiler_stats( metrics_to_measure, max_deviation_allowed)
The possible metrics that can be measured are 'bytes allocated'
, 'peak_megabytes_allocated'
, or 'max_bytes_used'
.
For its first parameter, the collect_compiler_stats function will take either
- One of those strings. eg:
collect_compiler_stats('bytes allocated')
- A list of those strings. eg:
collect_compiler_stats(['bytes allocated', 'max_bytes_used'])
- A string 'all' which is a shorthand for the list containing all 3 possible measurements.
For its second parameter, it takes a non-negative integer as its % of maximum deviation allowed. A deviation of 5 means that the difference between the expected value and the actual value measured by the test driver can differ by no more than +/- 5%. This value defaults to 20% as correctness is prioritized over speed and a test should not "fail" when it's "correct" even if a potential performance regression is introduced.
Since all of the parameters have defaults, the function can be called with either no arguments, just the metric to measure, or the metric to measure and the allowed deviation. (That is, if you need to specify an argument, all arguments to the left of that argument must be specified as well but all arguments to the right are optional)
In summary:
-
Tests which measure the performance of the compiler should be used with compile or compile_fail tests. These tests will be skipped if
-DDEBUG
is one (i.e. complier_debugged() is true), as the numbers are worthless then. -
Tests which measure the performance of the program, not the compiler, should be used in conjunction with a compile_and_run test.
See Running Performance Tests on how to run these tests.
4.4 The <test-fn> field
<test-fn> is a function which describes how the test should be built and maybe run. It also determines the number of arguments for <args...>. Each function comes in three forms:
-
test-fn
: Compiles the program, expecting compilation to succeed. -
test-fn_fail
: Compiles the program, expecting compilation to fail. -
test-fn_and_run
: Compiles the program, expecting it to succeed, and then runs the program.
The test functions mostly differ in how the compile the test case. The simplest test functions can only compile single file test cases, while the most complex test function can compile a multi file test case with different flags for each file. The possible test functions are:
-
compile, compile_fail, compile_and_run:
This is the simplest test function and can only handle compiling a single module test case. The source file to compile must correspond to the <name> of the test.
<args...> = [<extra_hc_opts>]
Where:
<extra_hc_opts>: arguments to pass to GHC when it compiles your test case.
-
warn_and_run
This is a variant of compile(and_run) test function, which checks both the compiler output (like compile) and runtime output (like compile_and_run).
-
multimod_compile, multimod_compile_fail, multimod_compile_and_run:
Compile a multi-module program using the GHC
--make
build system.<args...> = [<topmod>, <extra_hc_opts>]
Where:
<topmod>: The top level source file for your test case.
<extra_hc_opts>: arguments to pass to GHC when it compiles your test case.
-
multi_compile, multi_compile_fail, multi_compile_and_run:
Compile a multi source test case. This is for cases where the GHC
--make
build system is not enough, such as when you first need to compile a .c or .cmm file before compiling the Haskell top level module.<args...> = [<topmod>, [(<extra_mod>, <hc_opts>)], <extra_hc_opts>]
Where:
<topmod>: The top level source file for your test case.
[(<extra_mod>, <hc_opts>)]: A list of tuples where the first element is a source file for GHC to compile and the second element are arguments GHC should use to compile that particular source file.
<extra_hc_opts>: arguments to pass to GHC when it compiles your test case (applied to all source files).
-
multiunit_compile and multiunit_compile_fail
Compile a multi unit test case.
<args...> = [[<unit>], <extra_hc_opts>]
<unit>: Path to unit response file.
For examples see the
tests/driver/multipleHomeUnits
directory. -
run_command Just run an arbitrary command. The output is checked against
T.stdout
andT.stderr
(unlessignore_output
is used). The expected exit code can be changed using exit_code(N). NB: run_command only works in the normal way, so don't use only_ways with it. -
ghci_script Runs the current compiler, passing --interactive and using the specified script as standard input.
-
makefile_test Run
make
with the first argument as the target, or if no argument is given, use the name of the test as the target. Works like run_command otherwise.
5. Adding tests with external dependencies
If you test has non-boot dependencies then it can't be added directly to the GHC tree.
The alternative is to add the test to head.hackage
, there the test can depend on any libraries you want but failures won't stop merges, they will only be picked up later.
These tests are primarily suited for tests generated using quickcheck or other random testing libraries.
6. Sample output files
Normally, the sample stdout
and stderr
for a test T go in the
files T.stdout
and T.stderr
respectively. However, sometimes a
test may generate different output depending on the platform or
word-size. For this reason the test driver looks for sample output
files using this pattern:
T.stdout[-ws-<wordsize>][-<platform>]
Any combination of the optional extensions may be given, but they must
be in the order specified. The most specific output file that matches
the current configuration will be selected; for example if the
platform is i386-unknown-mingw32
then T.stderr-i386-unknown-mingw32
will be picked in preference to T.stderr
.
7. Threaded Considerations
The testsuite has fairly good support for running tests in parallel using a thread pool of size specified by the THREADS=<value>
. This does mean you need to be careful when writing test cases to keep them independent of each other. You are usually not able to share files between test cases as they can run in arbitrary order and will easily conflict with each other. If you must write test cases that are dependent on each other, be sure to use the high_memory_usage
setup function that insures a test case runs by itself in the main testsuite thread. All dependent test cases should use the high_memory_usage
setup function. Try not to do this extensively though as it means we can't easily speed up the testsuite by throwing cores at it.
For tests that are severely CPU bound, we should mark them as such using multi_cpu_race
as this ensures they are ran in isolation. On a resource constraint system this will give the test a fair chance of passing.