Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
GHC
GHC
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 4,322
    • Issues 4,322
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
    • Iterations
  • Merge Requests 361
    • Merge Requests 361
  • Requirements
    • Requirements
    • List
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Security & Compliance
    • Security & Compliance
    • Dependency List
    • License Compliance
  • Operations
    • Operations
    • Incidents
    • Environments
  • Analytics
    • Analytics
    • CI / CD
    • Code Review
    • Insights
    • Issue
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • Glasgow Haskell Compiler
  • GHCGHC
  • Issues
  • #17896

Closed
Open
Opened Mar 04, 2020 by Edgar Gomes@lostbean

Severe performance degradation on Linux

Summary

While trying to pack and deploy a computational intense code for serverless (AWS Lambda/GC Run) I stumbled on a severer performance degradation. The exactly same code that ran on about 30s (single thread) on my local machine (MacBook Pro) would timeout on the serverless limit of 15min.

I setup a Criterion benchmark and identified the function that was causing the performance degradation by comparing comparing with the results on Linux docker container. After I also tested on an older MacBook with native Linux installation and with different versions of GHC (8.6.5 and 8.8.3). All of the benchmarks under Linux showed the same behavior of heavily performance degradation in some specific parts of the code.

Deep Dive

Using GHC profile and Criterion (also comparing between native OSX and containerized Ubuntu), I managed to isolate the function that was suffering the performance degradation. The function called misoDoubleOR was going from 370 μs to 12.7 ms, in other words the Linux executable was 34x slower. And since this function is part of the main loop, it was causing the overall performance issue.

I started to benchmark other functions used by misoDoubleOR but none of them suffered the same level of degradation when compiled on Linux. I was almost hopeless about finding what was causing such a problem when stumble on this issue #17881 and it came to my mind that I was using eta to memoize some repeated calculation.

After removing the eta function, the execution time dropped to more acceptable levels (2x slower).

getMisoAngleEta :: Symm -> Quaternion -> Quaternion -> Double
getMisoAngleEta symm = let
  foo = getAbsShortOmega . getInFZ (getSymmOps symm)
  -- avoiding eta expansion of q1 and q2 to memorize
  in \q1 q2 -> foo (q2 -#- q1)
>> Darwin Kernel Version 19.2.0: root:xnu-6153.61.1~20/RELEASE_X86_64 x86_64
>> GHC 8.6.5 (stack lts-14.27)

arche      > benchmarks
Running 1 benchmarks...
Benchmark arche-bench: RUNNING...
benchmarking reference/fib
time                 74.88 ms   (71.96 ms .. 76.36 ms)
                     0.998 R²   (0.992 R² .. 1.000 R²)
mean                 77.33 ms   (75.98 ms .. 78.96 ms)
std dev              2.527 ms   (1.844 ms .. 3.340 ms)

benchmarking reference/malloc
time                 7.942 ms   (7.229 ms .. 8.772 ms)
                     0.935 R²   (0.899 R² .. 0.971 R²)
mean                 6.907 ms   (6.640 ms .. 7.335 ms)
std dev              1.018 ms   (711.4 μs .. 1.415 ms)
variance introduced by outliers: 76% (severely inflated)

benchmarking misoDoubleOR/with-eta
time                 370.6 μs   (357.9 μs .. 385.5 μs)
                     0.984 R²   (0.971 R² .. 0.994 R²)
mean                 383.9 μs   (369.8 μs .. 402.7 μs)
std dev              53.30 μs   (40.16 μs .. 71.08 μs)
variance introduced by outliers: 87% (severely inflated)

benchmarking misoDoubleOR/no-eta
time                 239.2 μs   (230.5 μs .. 247.6 μs)
                     0.977 R²   (0.954 R² .. 0.989 R²)
mean                 261.7 μs   (249.9 μs .. 281.7 μs)
std dev              54.43 μs   (38.19 μs .. 79.35 μs)
variance introduced by outliers: 95% (severely inflated)
>> Ubuntu 18.04 (Docker)
>> GHC 8.6.5 (stack lts-14.27)

Running 1 benchmarks...
Benchmark arche-bench: RUNNING...
benchmarking reference/fib
time                 144.0 ms   (141.2 ms .. 146.9 ms)
                     0.999 R²   (0.997 R² .. 1.000 R²)
mean                 148.9 ms   (146.0 ms .. 156.4 ms)
std dev              6.589 ms   (1.772 ms .. 9.655 ms)
variance introduced by outliers: 12% (moderately inflated)

benchmarking reference/malloc
time                 7.787 ms   (7.690 ms .. 7.964 ms)
                     0.998 R²   (0.995 R² .. 1.000 R²)
mean                 7.833 ms   (7.780 ms .. 7.919 ms)
std dev              183.9 μs   (120.9 μs .. 278.8 μs)

benchmarking misoDoubleOR/with-eta
time                 12.73 ms   (12.56 ms .. 12.98 ms)
                     0.998 R²   (0.995 R² .. 1.000 R²)
mean                 12.78 ms   (12.71 ms .. 12.92 ms)
std dev              272.3 μs   (154.4 μs .. 449.2 μs)

benchmarking misoDoubleOR/no-eta
time                 739.8 μs   (735.3 μs .. 746.2 μs)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 744.4 μs   (739.7 μs .. 757.3 μs)
std dev              23.24 μs   (5.377 μs .. 48.14 μs)
variance introduced by outliers: 21% (moderately inflated)

Steps to reproduce

I couldn't isolate the behavior in a simplified code but have put these benchmark on a branch.

git clone git@github.com:arche-tool/arche.git
cd arche
git submodule update --init
stack install

To run the benchmark inside a container:

make run-bench

Expected behavior

I would expected:

  1. Similar runtimes across different OSs
  2. Possibility of using eta reduction to memoize values without performance penalty

Environment

  • GHC version used: 8.6.5 and 8.8.3
  • Operating System: OSX and Ubuntu
Edited Mar 06, 2020 by Edgar Gomes
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
Reference: ghc/ghc#17896