Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
GHC
GHC
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 4,260
    • Issues 4,260
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
    • Iterations
  • Merge Requests 402
    • Merge Requests 402
  • Requirements
    • Requirements
    • List
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Security & Compliance
    • Security & Compliance
    • Dependency List
    • License Compliance
  • Operations
    • Operations
    • Incidents
    • Environments
  • Analytics
    • Analytics
    • CI / CD
    • Code Review
    • Insights
    • Issue
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • Glasgow Haskell Compiler
  • GHCGHC
  • Issues
  • #8885

Closed
Open
Opened Mar 13, 2014 by tibbe@trac-tibbe

Add inline versions of clone array primops

I've changed the clone array primops (i.e. cloneArray#, cloneMutableArray#, freezeArray#, and thawArray#) to use the new inline allocation optimization for statically known array sizes. Furthermore, I've moved the implementation for the non-statically known case out-of-line, which should reduce code size.

The numbers are very encouraging, with the new implementation being 74% (i.e. almost 4x) faster than the old one. I measured this by looking at the total time reported by +RTS -s for the attached InlineCloneArrayAlloc benchmark.

Here are the stats from the best out of three runs of the old implementation:

   1,600,041,120 bytes allocated in the heap
           6,504 bytes copied during GC
          35,992 bytes maximum residency (1 sample(s))
          21,352 bytes maximum slop
            1588 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0         1 colls,     0 par    0.01s    0.01s     0.0082s    0.0082s
  Gen  1         1 colls,     0 par    0.00s    0.11s     0.1062s    0.1062s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.29s  (  0.57s elapsed)
  GC      time    0.01s  (  0.11s elapsed)
  EXIT    time    0.01s  (  0.11s elapsed)
  Total   time    0.31s  (  0.80s elapsed)

  %GC     time       2.7%  (14.2% elapsed)

  Alloc rate    5,497,251,856 bytes per MUT second

  Productivity  97.3% of total user, 37.4% of total elapsed

Here are the same for the new implementation:

   1,600,041,120 bytes allocated in the heap
          57,224 bytes copied during GC
          35,992 bytes maximum residency (1 sample(s))
          21,352 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      3125 colls,     0 par    0.01s    0.01s     0.0000s    0.0000s
  Gen  1         1 colls,     0 par    0.00s    0.00s     0.0003s    0.0003s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.08s  (  0.08s elapsed)
  GC      time    0.01s  (  0.01s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.08s  (  0.09s elapsed)

  %GC     time       6.4%  (8.8% elapsed)

  Alloc rate    21,260,179,643 bytes per MUT second

  Productivity  93.5% of total user, 88.8% of total elapsed

The performance ratio between the new and old implementation gets worse for the old implementation as the iteration count is increased.

There's also an interesting difference in the Gen 1 collection times between the two implementations.

Trac metadata
Trac field Value
Version 7.9
Type FeatureRequest
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
Reference: ghc/ghc#8885