Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • GHC GHC
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 4,870
    • Issues 4,870
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
  • Merge requests 453
    • Merge requests 453
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Releases
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Glasgow Haskell Compiler
  • GHCGHC
  • Issues
  • #21274
Closed
Open
Created Mar 21, 2022 by Mikolaj Konarski@MikolajReporter

40% to 100% slowdown from -threaded

Unfortunately, I can't reproduce with GHC HEAD ghc-9.3.20220316 and head.hackage, because I'm getting a segfault (after applying a PR to sdl2-ttf that makes it compile). So the tests use GHC 9.2.2. To reproduce:

git clone git@github.com:LambdaHack/LambdaHack.git
git checkout v0.11.0.0
cabal build
make bench

then change LambdaHack.cabal by adding -threaded, as in

common exe-options
  ghc-options:        -rtsopts -threaded

and do again

cabal build
make bench

Depending on your version of C libsdl2 libraries this may or may not compile and/or run. Try master branch instead of v0.11.0.0 tag to overcome this.

My results without -threaded

~/r/LambdaHack$ make bench
$(cabal list-bin exe:LambdaHack) --dbgMsgSer --logPriority 4 --newGame 3 --noAnim --maxFps 100000 --frontendNull --benchmark --benchMessages --stopAfterFrames 1500 --automateAll --keepAutomated --gameMode battle --setDungeonRng "SMGen 127 123" --setMainRng "SMGen 127 125"
Session time: 0.927905003s; frames: 1500. Average clips per second: 6509.287028814522. Average FPS: 1616.5447919241362.
$(cabal list-bin exe:LambdaHack) --dbgMsgSer --logPriority 4 --newGame 3 --maxFps 100000 --frontendLazy --benchmark --benchMessages --stopAfterFrames 7000 --automateAll --keepAutomated --gameMode battle --setDungeonRng "SMGen 127 123" --setMainRng "SMGen 127 125"
Session time: 1.424040237s; frames: 7009. Average clips per second: 4766.719242638928. Average FPS: 4921.911486690667.
$(cabal list-bin exe:LambdaHack) --dbgMsgSer --logPriority 4 --newGame 3 --noAnim --maxFps 100000 --benchmark --benchMessages --stopAfterFrames 2000 --automateAll --keepAutomated --gameMode battle --setDungeonRng "SMGen 127 123" --setMainRng "SMGen 127 125"
Session time: 3.882431124s; frames: 2012. Average clips per second: 1706.147459784273. Average FPS: 518.2319880866481.
$(cabal list-bin exe:LambdaHack) --dbgMsgSer --logPriority 4 --newGame 1 --noAnim --maxFps 100000 --frontendNull --benchmark --benchMessages --stopAfterFrames 7000 --automateAll --keepAutomated --gameMode crawl --setDungeonRng "SMGen 123 123" --setMainRng "SMGen 123 125"
Session time: 3.159201467s; frames: 7010. Average clips per second: 7755.440815006434. Average FPS: 2218.915150940578.
$(cabal list-bin exe:LambdaHack) --dbgMsgSer --logPriority 4 --newGame 1 --noAnim --maxFps 100000 --benchmark --benchMessages --stopAfterFrames 7000 --automateAll --keepAutomated --gameMode crawl --setDungeonRng "SMGen 123 123" --setMainRng "SMGen 123 125"
Session time: 12.973855358s; frames: 7010. Average clips per second: 1888.4903002168958. Average FPS: 540.3174158001893.

and then with -threaded:

~/r/LambdaHack$ make bench
$(cabal list-bin exe:LambdaHack) --dbgMsgSer --logPriority 4 --newGame 3 --noAnim --maxFps 100000 --frontendNull --benchmark --benchMessages --stopAfterFrames 1500 --automateAll --keepAutomated --gameMode battle --setDungeonRng "SMGen 127 123" --setMainRng "SMGen 127 125"
Session time: 1.338236622s; frames: 1500. Average clips per second: 4513.402114921348. Average FPS: 1120.87800867252.
$(cabal list-bin exe:LambdaHack) --dbgMsgSer --logPriority 4 --newGame 3 --maxFps 100000 --frontendLazy --benchmark --benchMessages --stopAfterFrames 7000 --automateAll --keepAutomated --gameMode battle --setDungeonRng "SMGen 127 123" --setMainRng "SMGen 127 125"
Session time: 3.543814058s; frames: 7009. Average clips per second: 1915.4503845020865. Average FPS: 1977.8125729191404.
$(cabal list-bin exe:LambdaHack) --dbgMsgSer --logPriority 4 --newGame 3 --noAnim --maxFps 100000 --benchmark --benchMessages --stopAfterFrames 2000 --automateAll --keepAutomated --gameMode battle --setDungeonRng "SMGen 127 123" --setMainRng "SMGen 127 125"
Session time: 4.990851022s; frames: 2012. Average clips per second: 1327.2285569737448. Average FPS: 403.1376595155759.
$(cabal list-bin exe:LambdaHack) --dbgMsgSer --logPriority 4 --newGame 1 --noAnim --maxFps 100000 --frontendNull --benchmark --benchMessages --stopAfterFrames 7000 --automateAll --keepAutomated --gameMode crawl --setDungeonRng "SMGen 123 123" --setMainRng "SMGen 123 125"
Session time: 4.56074518s; frames: 7010. Average clips per second: 5372.1484172022965. Average FPS: 1537.0295255127585.
$(cabal list-bin exe:LambdaHack) --dbgMsgSer --logPriority 4 --newGame 1 --noAnim --maxFps 100000 --benchmark --benchMessages --stopAfterFrames 7000 --automateAll --keepAutomated --gameMode crawl --setDungeonRng "SMGen 123 123" --setMainRng "SMGen 123 125"
Session time: 17.029107743s; frames: 7010. Average clips per second: 1438.7718000123289. Average FPS: 411.6481089786713.

Here is some additional info from an investigation by @duog with a month or two older version of the codebase:

  1. marking all imports in sdl2 and sdl2-ttf as unsafe does not improve the discrepancy

  2. results of perf stat on binaries WITH unsafe foreign calls in sdl2 and sdl2-ttf

with-threaded:

perf stat -dd make benchFrontendCrawl 
$(cabal list-bin exe:LambdaHack) --dbgMsgSer --logPriority 4 --newGame 1 --noAnim --maxFps 100000 --benchmark --benchMessages --stopAfterFrames 7000 --automateAll --keepAutomated --gameMode crawl --frontendNull --setDungeonRng "SMGen 123 123" --setMainRng "SMGen 123 125"
Session time: 7.796642283s; frames: 7005. Average clips per second: 2809.28625489948. Average FPS: 898.463690616393.

 Performance counter stats for 'make benchFrontendCrawl':

          8,162.78 msec task-clock:u              #    0.907 CPUs utilized          
                 0      context-switches:u        #    0.000 /sec                   
                 0      cpu-migrations:u          #    0.000 /sec                   
            53,284      page-faults:u             #    6.528 K/sec                  
    23,013,213,417      cycles:u                  #    2.819 GHz                      (42.90%)
       136,921,195      stalled-cycles-frontend:u #    0.59% frontend cycles idle     (43.29%)
       448,778,570      stalled-cycles-backend:u  #    1.95% backend cycles idle      (43.21%)
    18,944,979,353      instructions:u            #    0.82  insn per cycle         
                                                  #    0.02  stalled cycles per insn  (43.19%)
     3,802,875,276      branches:u                #  465.880 M/sec                    (43.12%)
       202,717,840      branch-misses:u           #    5.33% of all branches          (43.10%)
     7,928,213,661      L1-dcache-loads:u         #  971.264 M/sec                    (42.98%)
       285,102,363      L1-dcache-load-misses:u   #    3.60% of all L1-dcache accesses  (42.85%)
   <not supported>      LLC-loads:u                                                 
   <not supported>      LLC-load-misses:u                                           
     1,961,103,523      L1-icache-loads:u         #  240.249 M/sec                    (43.03%)
        19,840,051      L1-icache-load-misses:u   #    1.01% of all L1-icache accesses  (42.83%)
        55,739,515      dTLB-loads:u              #    6.828 M/sec                    (42.96%)
         5,303,284      dTLB-load-misses:u        #    9.51% of all dTLB cache accesses  (42.99%)
        46,210,060      iTLB-loads:u              #    5.661 M/sec                    (42.98%)
         4,717,836      iTLB-load-misses:u        #   10.21% of all iTLB cache accesses  (43.07%)

       9.001519486 seconds time elapsed

       7.657531000 seconds user
       0.748577000 seconds sys

without -threaded

perf stat -dd make benchFrontendCrawl 
$(cabal list-bin exe:LambdaHack) --dbgMsgSer --logPriority 4 --newGame 1 --noAnim --maxFps 100000 --benchmark --benchMessages --stopAfterFrames 7000 --automateAll --keepAutomated --gameMode crawl --frontendNull --setDungeonRng "SMGen 123 123" --setMainRng "SMGen 123 125"
Session time: 5.799896649s; frames: 7005. Average clips per second: 3776.4466033677422. Average FPS: 1207.7801422906011.

 Performance counter stats for 'make benchFrontendCrawl':

          6,236.18 msec task-clock:u              #    0.896 CPUs utilized          
                 0      context-switches:u        #    0.000 /sec                   
                 0      cpu-migrations:u          #    0.000 /sec                   
            53,332      page-faults:u             #    8.552 K/sec                  
    18,172,577,519      cycles:u                  #    2.914 GHz                      (42.88%)
       173,871,640      stalled-cycles-frontend:u #    0.96% frontend cycles idle     (42.89%)
       384,575,183      stalled-cycles-backend:u  #    2.12% backend cycles idle      (42.90%)
    18,797,086,192      instructions:u            #    1.03  insn per cycle         
                                                  #    0.02  stalled cycles per insn  (42.83%)
     3,777,918,464      branches:u                #  605.807 M/sec                    (42.79%)
       192,408,440      branch-misses:u           #    5.09% of all branches          (42.79%)
     7,620,726,761      L1-dcache-loads:u         #    1.222 G/sec                    (42.95%)
       273,000,424      L1-dcache-load-misses:u   #    3.58% of all L1-dcache accesses  (43.04%)
   <not supported>      LLC-loads:u                                                 
   <not supported>      LLC-load-misses:u                                           
     1,880,798,031      L1-icache-loads:u         #  301.595 M/sec                    (43.08%)
        17,176,000      L1-icache-load-misses:u   #    0.91% of all L1-icache accesses  (43.09%)
        54,063,881      dTLB-loads:u              #    8.669 M/sec                    (43.15%)
         5,052,959      dTLB-load-misses:u        #    9.35% of all dTLB cache accesses  (43.13%)
        32,697,768      iTLB-loads:u              #    5.243 M/sec                    (43.05%)
         2,887,900      iTLB-load-misses:u        #    8.83% of all iTLB cache accesses  (42.95%)
  1. going vanilla to -threaded costs 20% instructions per cycle. I think that's quite bad. Unfortunately my AMD CPU doesn't support LLC-loads (i.e. level3 cache) counters.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking