... | ... | @@ -6,94 +6,68 @@ |
|
|
Nowadays it might be better to use the Performance Events? infrastructure in Linux 2.6.31 and later.
|
|
|
|
|
|
|
|
|
What follows are some notes I made when I last got PAPI working on Linux.
|
|
|
For some notes on installing PAPI on Linux, see [Debugging/LowLevelProfiling/PAPI/Installing](debugging/low-level-profiling/papi/installing).
|
|
|
|
|
|
## Installing PAPI
|
|
|
# Measuring program performance using CPU events
|
|
|
|
|
|
- Grab PAPI and the most recent perfctr 2.6.x
|
|
|
- Unpack both under `~/tmp`
|
|
|
|
|
|
### On Ubuntu
|
|
|
The GHC runtime has been extended to support the use of the [ PAPI](http://icl.cs.utk.edu/papi/) library to count occurrences of CPU events such as cache misses and branch mispredictions. The PAPI extension separates the events occurring in the garbage collector and mutator code for more accurate pinpointing of performance problems.
|
|
|
|
|
|
|
|
|
Get the bits needed to build a kernel:
|
|
|
This page describes how to compile the RTS with PAPI enabled and explains the RTS options for CPU event selection. This page also contains patches to collect CPU event information in nofib runs and to allow their comparison using nofib-analyse. This is especially useful to measure the effects of optimisations accross a whole range of programs systematically.
|
|
|
|
|
|
- `sudo apt-get install linux-kernel-devel fakeroot kernel-wedge kernel-package`
|
|
|
- `sudo apt-get build-dep linux-source`
|
|
|
- `sudo apt-get install linux-source`
|
|
|
- This seems to get a slightly out-of-date version, but maybe that's ok
|
|
|
# Status of the implementation
|
|
|
|
|
|
|
|
|
Unpack and patch the kernel:
|
|
|
GHC with PAPI support should compile on any platform where PAPI is installed. It should also be possible to monitor the cache miss events of a ghc compiled program.
|
|
|
|
|
|
- `cd ~/tmp`
|
|
|
- `tar xvjf /usr/src/linux-source-*`
|
|
|
- `cd linux-source*`
|
|
|
- Read `$perfctr/INSTALL`
|
|
|
- `$perfctr/update-kernel` Might fail because it can't find the right patch, in which case:
|
|
|
- `$perfctr/update-kernel --patch=2.6.22` (tell it which patch)
|
|
|
- `make menuconfig`
|
|
|
- Turn on `perfctr`-related stuff under "Processor type and features", "Performance-monitoring counters support".
|
|
|
You need both "virtual" and "global" support turned on.
|
|
|
|
|
|
At present, the monitoring of branch mispredictions and stalled cycles is AMD Opteron specific. In the case of branch mispredictions, the portable PAPI API only monitors conditional jumps. We would like to monitor all jumps, especially indirect jumps, that is why we used a native AMD PAPI counter. For strange reasons, the PAPI conditional jump counter maps to the native counter we are using, but we cannot rely on this behaviour on other platforms, so we use the native counter anyway.
|
|
|
|
|
|
Build the kernel:
|
|
|
# Compiling and running programs with PAPI
|
|
|
|
|
|
- `make-kpkg --rootcmd fakeroot --initrd --append-to-version=-perfctr kernel-image kernel-headers`
|
|
|
- wait a while
|
|
|
|
|
|
First of all, make sure that you have installed the [ PAPI library](http://icl.cs.utk.edu/papi/).
|
|
|
|
|
|
Install the kernel:
|
|
|
|
|
|
- `cd ..`
|
|
|
- `dpkg -i linux-image-*.deb`
|
|
|
- `dpkg -i linux-headers-*.deb`
|
|
|
- Copy contents of `lib/firmware/$old_kernel` to `/lib/firmware/$new_kernel` (not sure if this is right, but so
|
|
|
far the wireless adaptor still seems fine under the new kernel, so I guess it worked)
|
|
|
- `cd $perfctr`
|
|
|
- `cp etc/perfctr.rules /etc/udev/rules.d/99-perfctr.rules`
|
|
|
- Boot the new kernel
|
|
|
- I seemed to have `perfctr` built as a module, probably selected that in menuconfig by mistake, so anyway:
|
|
|
- `sudo modprobe perfctr`
|
|
|
- `cat /proc/misc | grep perfctr` should now show perfctr, and you should have `/dev/perfctr`.
|
|
|
Follow the instructions in [Building/Hacking](building/hacking) and add the following line to `build.mk` before compiling the RTS:
|
|
|
|
|
|
```wiki
|
|
|
GhcRtsWithPapi = YES
|
|
|
```
|
|
|
|
|
|
|
|
|
Now, to monitor and report level 1 cache misses, invoke a program compiled by ghc as follows:
|
|
|
|
|
|
```wiki
|
|
|
./program +RTS -sstderr -a1 -RTS
|
|
|
```
|
|
|
|
|
|
|
|
|
Build the perfctr library:
|
|
|
The help screen provides options to monitor more events:
|
|
|
|
|
|
- `cd $perfctr`
|
|
|
- `make PREFIX=$HOME/local`
|
|
|
- `make PREFIX=$HOME/local install`
|
|
|
```wiki
|
|
|
./program +RTS -h -RTS
|
|
|
```
|
|
|
|
|
|
# Using PAPI with the nofib benchmarking suite
|
|
|
|
|
|
Now to build PAPI:
|
|
|
|
|
|
- `cd $papi`
|
|
|
- The configure script had some trouble detecting the C compiler for me, I had to edit the configure script and re-autoconf it. Change
|
|
|
In order to use the nofib suite with PAPI, you have to use apply the three patches at the bottom of this page.
|
|
|
|
|
|
```wiki
|
|
|
if test "$OS" != "linux"; then
|
|
|
...
|
|
|
else
|
|
|
AC_PROG_CC
|
|
|
AC_PROG_F77
|
|
|
fi
|
|
|
```
|
|
|
1. The first patch adds a PAPI flag to the perl testing script.
|
|
|
1. The second patch adds a make argument to the nofib suite to enable the collection of PAPI number.
|
|
|
1. The third patch makes nofib-analyse able to process the output produced in the second patch. The standard nofib-analyse won't cut it.
|
|
|
|
|
|
|
|
|
to
|
|
|
These patches are not submitted to the HEAD (yet?) because they are not mature, but they are useful. Probably the (only?) patch that needs more work is the third one.
|
|
|
|
|
|
```wiki
|
|
|
AC_PROG_CC
|
|
|
AC_PROG_F77
|
|
|
```
|
|
|
|
|
|
- `autoconf`
|
|
|
- We have a choice about whether to use the `libperfctr` in the PAPI distribution, or the one that comes with `perfctr`. The
|
|
|
latter is probably more correct, but the former also worked for me.
|
|
|
- `./configure --with-perfctr-prefix=$HOME/local` --prefix=$HOME/local
|
|
|
- `make`
|
|
|
- `./run-tests.sh` (I got about 4 failures on Core 2)
|
|
|
- `make install`
|
|
|
To collect statistics just run make inside nofib as usual, as an example let's collect statistics together with cache misses: `make papi=1`.
|
|
|
|
|
|
# Resources
|
|
|
|
|
|
Now it's all built, with header files in `$HOME/local/include`, libraries in `$HOME/local/lib`. |
|
|
- [ http://icl.cs.utk.edu/papi/](http://icl.cs.utk.edu/papi/) PAPI home page.
|
|
|
- [ http://developer.amd.com/article_print.jsp?id=90](http://developer.amd.com/article_print.jsp?id=90) An article introducing the business of using CPU counters for performance measurement.
|
|
|
- [ http://developer.amd.com/articles.jsp?id=2&num=1](http://developer.amd.com/articles.jsp?id=2&num=1) An article introducing AMD's code analyst. It even has pipeline simulation, though I haven't tried it out yet.
|
|
|
- [ http://www.cs.mu.oz.au/\~njn/pubs/cache-large-lazy2002.ps.gz](http://www.cs.mu.oz.au/~njn/pubs/cache-large-lazy2002.ps.gz) The Cache Behaviour of Large Lazy Functional Programs on Stock Hardware. |