Skip to content

Make lines stricter to fix space leak

I propose changing the implementation of lines to be stricter.

The current implementation causes (in GHC) a space leak, so it is better to sacrifice a little laziness to fix that leak.

Period of discussion: Three weeks, until 15^th^ October (because of ICFP).

Current implementation:

lines                   :: String -> [String]
lines ""                =  []
lines s                 =  let (l, s') = break (== '\n') s
                           in  l : case s' of
                                        []      -> []
                                        (_:s'') -> lines s''

The relevant part of the core showing the leak (-O or -O2, similar for -O0 but without inlining break):

        let {
          ds1_sjM :: ([GHC.Types.Char], [GHC.Types.Char])
          LclId
          [Str: DmdType]
          ds1_sjM =
            case GHC.List.$wbreak @ GHC.Types.Char lvl1_rjW wild_B1
            of _ { (# ww1_aj3, ww2_aj4 #) ->
            (ww1_aj3, ww2_aj4)
            } } in
        GHC.Types.:
          @ GHC.Base.String
          (case ds1_sjM of _ { (l_aif, _) -> l_aif })
          (case ds1_sjM of _ { (_, s'_aih) ->
           case s'_aih of _ {
             [] -> GHC.Types.[] @ GHC.Base.String;
             : _ s''_adj -> Lines.lines s''_adj
           }
           })

ds1_sjM keeps a reference to the first line until the second is demanded, preventing it from being GCed.

Current strictness properties:

lines _|_       = _|_
lines (_|_:_|_) = _|_ : _|_

Proposed implementation:

lines                   :: String -> [String]
lines ""                =  []
lines s                 = case break (== '\n') s of
                            (l, s') -> l : case s' of
                                            []      -> []
                                            (_:s'') -> lines s''

Due to the pattern matching on break's result, the pair is never considered a single entity, allowing the first line to be GCed before the second is demanded.

Strictness properties of the proposed implementation:

lines _|_       = _|_
lines (_|_:_|_) = _|_

Generally, a _|_ as the first Char of a line would produce a _|_ instead of the _|_ : _|_ which the current implementation produces. _|_ in any other place would be treated identically.

Trac metadata
Trac field Value
Version 6.12.3
Type Bug
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component libraries/base
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information