chak · 71141b69
--- a/status/may09.md
+++ b/status/may09.md
@@ -81,7 +81,13 @@ GHC 6.12 will feature parallel profiling in the form of [ ThreadScope](http://ra
 ### Data Parallel Haskell


-DPH remains under very active development. The [current state of play](data-parallel), including some benchmark figures, is on the wiki.  We also wrote a substantial paper [ Harnessing the multicores: nested data parallelism in Haskell](http://research.microsoft.com/~simonpj/papers/ndp) for FSTTCS; you may find this paper a useful tutorial on the whole idea of nested data parallelism.
+DPH remains under very active development by Manuel Chakravarty, Gabriele Keller, Roman Leshchinskiy, and Simon Peyton Jones. The [current state of play](data-parallel) is documented on the wiki.  We also wrote a substantial paper [ Harnessing the multicores: nested data parallelism in Haskell](http://research.microsoft.com/~simonpj/papers/ndp) for FSTTCS; you may find this paper a useful tutorial on the whole idea of nested data parallelism.
+
+
+The system currently works well for small programs, such as computing a dot product or the product of a sparse matrix with a dense vector.  For such applications, the generated code is as close to hand written C code as GHC's current code generator enables us to be (i.e., within a factor of 2 or 3).  We ran three small benchmarks on an 8-core x86 server and on an 8-core UltraSPARC T2 server, from which we derived two comparative figures: [ a comparison between x86 and T2 on a memory-intensive benchmark (dot product)](http://justtesting.org/post/83014052/this-is-the-performance-of-a-dot-product-of-two) and [ a summary of the speedup of three benchmarks on x86 and T2.](http://justtesting.org/post/85103645/these-graphs-summarise-the-performance-of-data) Overall, we achieved good absolute performance and good scalability on the hardware we tested.
+
+
+Our next step is to scale the implementation up to properly handle larger programs.  In particular, we are currently working on improving the interaction between vectorised code, the aggressively inlining array library, and GHC's standard optimisation phases with the goal of reducing excessively long compile times due to a temporary code explosion during optimisation.  Moreover, Gabriele started to work on integrating specialised support for regular multi-dimensional arrays into the existing framework for nested data parallelism.

 ### Type system improvements