Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • GHC GHC
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 4,838
    • Issues 4,838
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
  • Merge requests 452
    • Merge requests 452
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Releases
  • Analytics
    • Analytics
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
    • Value stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Glasgow Haskell Compiler
  • GHCGHC
  • Issues
  • #8647

Closed
Open
Created Jan 03, 2014 by Herbert Valerio Riedel@hvr🕺Maintainer

Reduce allocations in `integer-gmp`

I've added printf(3)s to integer-gmps GMP allocation/reallocation functions, and noticed there to a significant amount of allocations going on.

This is due to the fact that the CMM wrappers call gmpz_init() to initialize the result, and this already allocates a 1-limb StgArray. But the actual GMP operations perform a reallocate-call (which by contract also needs memcpy the untouched limb over to the newly allocated structure) based on the type of operation and the involved operands, causing the optimistically allocated 1-limb structure to become garbage right away w/o even being written to.

I've been able to get rid of most such superfluous allocations by replacing

ccall __gmpz_init(mp_result1 "ptr");

by

MP_INT__mp_alloc(mp_result1) = 0;
MP_INT__mp_size(mp_result1)  = 0;
MP_INT__mp_d(mp_result1)     = 0; // (1)

which is handled properly by all GMP operations we make use of currently. This elegantly lets the very first allocation be performed within the GMP operation (which has better knowledge with respect to how many limbs the result will require)

I've got working proof-of-concept code, which reduces the allocations the big-num heavy nofib programs (I've omitted the benchmarks which had a 0% allocation change):

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
...
     bernouilli          -0.1%     -5.2%      0.12      0.12     +0.0%
          kahan          -0.1%    -10.5%      0.17      0.17     +0.0%
      primetest          -0.1%    -12.8%      0.07      0.07     +0.0%
            rsa          -0.1%    -13.8%      0.02      0.02     +0.0%
         symalg          -0.1%     -2.3%      0.01      0.01     +0.0%
...
--------------------------------------------------------------------------------
            Min          -0.1%    -13.8%     -3.1%     -3.1%     -6.5%
            Max          -0.0%     +0.4%     +4.0%     +4.0%     +1.5%
 Geometric Mean          -0.1%     -0.5%     +0.5%     +0.4%     -0.1%

(1) This should actually point to a proper static closure I didn't need this for the proof-of-concept


Another place which helps avoiding temporarily allocating 1-limb StgArrays which live only for the duration of GMP FFI calls are those caused by the toBig-casts, which I'm also experimenting by making use of GMP's big-int/small-int add/sub/mul primitives (the deltas shown above are actually measured on top of this optimization), here's what to expect from such an optimization by itself (i.e. w/o the realloc-optimization from above):

     bernouilli          +0.1%     -4.2%      0.12      0.12     +0.0%
          kahan          +0.1%    -12.6%      0.17      0.17     +0.0%
          power          +0.1%     -2.7%     +2.2%     +2.2%     +0.0%
            rsa          +0.1%     -4.1%      0.02      0.02     +0.0%
            scs          +0.1%     -2.6%     -1.6%     -1.6%     +0.0%

Thus, the kahan benchmark benefits from this optimization by -12.6%, and on top of that another -10.5% by also avoiding the GMP-reallocations.

Trac metadata
Trac field Value
Version 7.6.3
Type Task
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component libraries (other)
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking