Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Register
  • Sign in
  • GHC GHC
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
    • Locked files
  • Issues 5.5k
    • Issues 5.5k
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
  • Merge requests 639
    • Merge requests 639
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Artifacts
    • Schedules
    • Test cases
  • Deployments
    • Deployments
    • Releases
  • Packages and registries
    • Packages and registries
    • Model experiments
  • Analytics
    • Analytics
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Glasgow Haskell CompilerGlasgow Haskell Compiler
  • GHCGHC
  • Issues
  • #19727

random benchmarks are up to 65x slower in GHC 9.0.1

Reproduction

git clone https://github.com/haskell/random
cd random
git checkout b55aaa4

cabal bench -w ghc-8.10.4 --ghc-options '-fproc-alignment=64' --benchmark-options '--csv 8.10.4.csv --hide-successes' random:bench
cabal bench -w ghc-9.0.1 --ghc-options '-fproc-alignment=64' --benchmark-options '--baseline 8.10.4.csv --csv 9.0.1.csv --hide-successes --fail-if-slower 50' random:bench

Expected results

No benchmarks become slower.

Actual results

All
  pure
    uniformR
      full
        Word8:                      FAIL (2.59s)
           10 ms ± 499 μs, 4576% slower than baseline
        Word16:                     FAIL (3.64s)
           14 ms ± 1.3 ms, 6536% slower than baseline
        Word32:                     FAIL (4.86s)
          9.5 ms ± 476 μs, 5057% slower than baseline
        Int8:                       FAIL (2.96s)
           12 ms ± 624 μs, 1756% slower than baseline
        Int16:                      FAIL (1.50s)
          358 μs ±  33 μs, 56% slower than baseline
        Char:                       FAIL (1.94s)
           15 ms ± 1.5 ms, 2690% slower than baseline
        CChar:                      FAIL (1.51s)
           12 ms ± 1.1 ms, 1688% slower than baseline
        CSChar:                     FAIL (3.10s)
           12 ms ± 430 μs, 1822% slower than baseline
        CUChar:                     FAIL (2.68s)
           11 ms ± 464 μs, 4793% slower than baseline
        CUShort:                    FAIL (1.81s)
           14 ms ± 858 μs, 6366% slower than baseline
        CUInt:                      FAIL (1.27s)
           10 ms ± 929 μs, 5263% slower than baseline
      excludeMax
        Word8:                      FAIL (2.61s)
           10 ms ± 903 μs, 4358% slower than baseline
        Word16:                     FAIL (1.81s)
           14 ms ± 1.2 ms, 5920% slower than baseline
        Word32:                     FAIL (7.41s)
           15 ms ± 233 μs, 6141% slower than baseline
        Word64:                     FAIL (1.30s)
          318 μs ±  29 μs, 58% slower than baseline
        Word:                       FAIL (2.60s)
          316 μs ±  29 μs, 63% slower than baseline
        Int8:                       FAIL (3.08s)
           12 ms ± 947 μs, 1760% slower than baseline
        Int16:                      FAIL (1.44s)
          352 μs ±  28 μs, 51% slower than baseline
        Int64:                      FAIL (2.71s)
          333 μs ±  16 μs, 52% slower than baseline
        Int:                        FAIL (1.34s)
          331 μs ±  32 μs, 51% slower than baseline
        Char:                       FAIL (1.86s)
           15 ms ± 1.1 ms, 2634% slower than baseline
        CChar:                      FAIL (3.00s)
           12 ms ± 568 μs, 1702% slower than baseline
        CSChar:                     FAIL (3.02s)
           12 ms ± 452 μs, 1677% slower than baseline
        CUChar:                     FAIL (1.31s)
           10 ms ± 875 μs, 4354% slower than baseline
        CShort:                     FAIL (2.87s)
          350 μs ±  24 μs, 51% slower than baseline
        CUShort:                    FAIL (1.81s)
           14 ms ± 1.3 ms, 2385% slower than baseline
        CUInt:                      FAIL (1.78s)
           14 ms ± 913 μs, 2441% slower than baseline
        CULong:                     FAIL (2.56s)
          315 μs ±  13 μs, 59% slower than baseline
        CSize:                      FAIL (2.57s)
          315 μs ±  22 μs, 60% slower than baseline
        CSigAtomic:                 FAIL (3.07s)
          377 μs ±  24 μs, 62% slower than baseline
        CULLong:                    FAIL (2.55s)
          313 μs ±  20 μs, 58% slower than baseline
        CUIntPtr:                   FAIL (2.55s)
          313 μs ±  22 μs, 57% slower than baseline
        CUIntMax:                   FAIL (1.26s)
          306 μs ±  27 μs, 53% slower than baseline
      includeHalf
        Word8:                      FAIL (2.63s)
           10 ms ± 448 μs, 4244% slower than baseline
        Word16:                     FAIL (1.78s)
           14 ms ± 1.2 ms, 5732% slower than baseline
        Word32:                     FAIL (3.57s)
           14 ms ± 893 μs, 5551% slower than baseline
        Word64:                     FAIL (2.69s)
          329 μs ±  21 μs, 54% slower than baseline
        Word:                       FAIL (1.34s)
          333 μs ±  32 μs, 56% slower than baseline
        Int8:                       FAIL (3.41s)
           13 ms ± 659 μs, 1063% slower than baseline
        Char:                       FAIL (7.60s)
           15 ms ± 288 μs, 2455% slower than baseline
        CChar:                      FAIL (1.73s)
           14 ms ± 915 μs, 1093% slower than baseline
        CSChar:                     FAIL (1.75s)
           14 ms ± 951 μs, 1066% slower than baseline
        CUChar:                     FAIL (1.26s)
          9.9 ms ± 859 μs, 3963% slower than baseline
        CUShort:                    FAIL (1.78s)
           14 ms ± 840 μs, 5768% slower than baseline
        CUInt:                      FAIL (1.81s)
           14 ms ± 1.3 ms, 5454% slower than baseline
        CULong:                     FAIL (2.71s)
          330 μs ±  20 μs, 57% slower than baseline
        CSize:                      FAIL (2.72s)
          336 μs ±  27 μs, 57% slower than baseline
        CULLong:                    FAIL (2.66s)
          327 μs ±  18 μs, 54% slower than baseline
        CUIntPtr:                   FAIL (2.72s)
          330 μs ±  17 μs, 56% slower than baseline
        CUIntMax:                   FAIL (2.84s)
          349 μs ±  25 μs, 65% slower than baseline

It seems that inlining in GHC 9.0.1 works differently to what it used to do in GHC 8.10.4. I wonder if it's the same issue as in #19557 (closed) Enforcing more inlining by pragmas (c9471d4) improves the most outrageous regressions, but still does not bring it back to baseline levels:

All
  pure
    uniformR
      full
        Int8:                       FAIL (1.57s)
          380 μs ±  34 μs, 58% slower than baseline
        Int16:                      FAIL (1.57s)
          387 μs ±  31 μs, 62% slower than baseline
        CChar:                      FAIL (1.54s)
          376 μs ±  27 μs, 59% slower than baseline
        CSChar:                     FAIL (1.56s)
          384 μs ±  27 μs, 60% slower than baseline
        CWchar:                     FAIL (3.11s)
          378 μs ±  21 μs, 54% slower than baseline
      excludeMax
        Word64:                     FAIL (1.29s)
          315 μs ±  30 μs, 53% slower than baseline
        Word:                       FAIL (1.38s)
          340 μs ±  33 μs, 67% slower than baseline
        Int8:                       FAIL (1.61s)
          389 μs ±  28 μs, 57% slower than baseline
        Int64:                      FAIL (1.45s)
          353 μs ±  33 μs, 56% slower than baseline
        CULong:                     FAIL (1.27s)
          313 μs ±  26 μs, 54% slower than baseline
        CSize:                      FAIL (2.60s)
          317 μs ±  19 μs, 53% slower than baseline
        CULLong:                    FAIL (1.28s)
          314 μs ±  29 μs, 53% slower than baseline
        CUIntPtr:                   FAIL (2.60s)
          317 μs ±  28 μs, 54% slower than baseline
        CIntMax:                    FAIL (1.42s)
          350 μs ±  32 μs, 52% slower than baseline
        CUIntMax:                   FAIL (1.28s)
          316 μs ±  28 μs, 56% slower than baseline
      includeHalf
        Word64:                     FAIL (2.72s)
          331 μs ±  24 μs, 54% slower than baseline
        CULong:                     FAIL (2.66s)
          324 μs ±  19 μs, 56% slower than baseline
        CSize:                      FAIL (2.68s)
          326 μs ±  21 μs, 54% slower than baseline
        CULLong:                    FAIL (2.74s)
          335 μs ±  24 μs, 59% slower than baseline
        CUIntPtr:                   FAIL (1.36s)
          329 μs ±  28 μs, 56% slower than baseline
        CUIntMax:                   FAIL (2.88s)
          352 μs ±  14 μs, 73% slower than baseline

CC @lehins

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking