Skip to content

Data.List 'nub' function is O(n^2)

I recently discovered that some Haskell code was running much slower than I would have expected. I eventually traced the problem to the 'nub' function in the Data.List, which ghc implements as follows:

nub l                   = nub' l []
  where
    nub' [] _           = []
    nub' (x:xs) ls
        | x `elem` ls   = nub' xs ls
        | otherwise     = x : nub' xs (x:ls)

This would seem to be O(n**2), because it accumulates the values it sees in a list. If it used a different data structure like a Set, it could be made O(n log n).

I'm not sure whether this should be considered a bug or not. The list nub returns is correct. On the other hand, when I call a library function in any programming language, I would normally expect it to use an algorithm that provides the best achievable asymptotic performance.

If you decide that you don't consider this to be a bug, can I suggest adding a note to the documentation, so people are aware that nub should only be used with short lists?

Edited by Ian Lynagh -
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information