Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
GHC
GHC
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 4,249
    • Issues 4,249
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
    • Iterations
  • Merge Requests 391
    • Merge Requests 391
  • Requirements
    • Requirements
    • List
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Security & Compliance
    • Security & Compliance
    • Dependency List
    • License Compliance
  • Operations
    • Operations
    • Incidents
    • Environments
  • Analytics
    • Analytics
    • CI / CD
    • Code Review
    • Insights
    • Issue
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • Glasgow Haskell Compiler
  • GHCGHC
  • Issues
  • #18414

Closed
Open
Opened Jul 02, 2020 by harendra@harendra

Perf regression when TypeFamilies extension is enabled

Summary

Enabling the TypeFamilies extension (but not actually using it) changes how the core is optimized. It is observed that in some cases it leads to severe performance degradation, see this issue for some numbers. Surprisingly, in some other cases it also leads to performance improvements. Ideally we would like to keep those improvements (even without TypeFamilies) but not have the regressions. In this issue I am describing a minimal example and the generated core differences for the performance degradation case.

Steps to reproduce

  1. git clone https://github.com/composewell/streamly.git
  2. git checkout type-families-perf-regression
  3. cabal build --write-ghc-environment-files=always
  4. ghc -O2 -fspec-constr-recursive=16 -fmax-worker-args=16 -ddump-simpl -ddump-to-file product.hs
  5. ./product

This will generate the core for the degraded case i.e. when the TypeFamilies extension is enabled.

For the baseline i.e. without the TypeFamilies extension:

  1. git checkout HEAD~2
  2. repeat steps 3 to 5 above

Core-2-Core pass outputs for good and bad cases

For convenient viewing I have committed the core files for the good case and the bad case in the following files:

  • good case core passes with -dsuppress-all
  • good case core passes full
  • bad case core passes with -duppress-all
  • bad case core passes full

Preliminary Observations

The function of interest in the core is $wgo.diff product.good/product.dump-simpl product.bad/product.dump-simpl shows the following difference (the first one is the good case and second one is the bad case):

<                 $s$wgo_sloM sc_sloL sc1_sloK
<                   = case ># sc1_sloK (+# ww2_sll9 100000#) of {
<                       __DEFAULT -> jump $s$wgo_sloM sc_sloL (+# sc1_sloK 1#);
<                       1# -> jump exit_XU sc_sloL
---
>                 $s$wgo_slqH sc_slqG sc1_slqF sc2_slqE
>                   = case ># sc1_slqF (+# ww2_slkM 100000#) of {
>                       __DEFAULT ->
>                         jump $s$wgo_slqH sc_slqG (+# sc1_slqF 1#) (*# sc2_slqE sc1_slqF);
>                       1# -> $wgo (-# ww_sll6 1#) sc_slqG

In the bad case we see a redundant (*# sc2_slqE sc1_slqF) computation being passed to $s$wgo_slqH. sc2_slqE is not being used anywhere except in this argument.

What's causing this difference in the core? Let's look at core-2-core passes. The first difference comes in the second simplifier phase (use vimdiff product.good.full/product.05-* product.bad.full/product.05-* in the repo). We observe here that in the function go (called from main):

  • in the good case the type of step contains a forall p whereas in the bad case it is specialized to the actual type.
  • in the good case we see a joinrec inside let whereas in the bad case we see a letrec.

This difference carries on in the next passes, we can view it by just changing the core pass numbers and diffing the committed core output e.g. vimdiff product.good/product.06-* product.bad/product.06-*

After pass 12 (vimdiff product.good/product.12-* product.bad/product.12-*) i.e. the simplifier pass after worker wrapper binds, we can see that the redundant argument performing multiply operation is gone in the good case (also, joins 2/2) but remains in the bad case (joins 0/3). That is what we see in the final core output as well.

Expected behavior

Performance is expected to remain the same when TypeFamilies extension is enabled and not used.

Environment

  • GHC version used:
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 8.8.3

Optional:

  • Operating System: Mac OS X
  • System Architecture: x86_64
Edited Jul 02, 2020 by harendra
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
Reference: ghc/ghc#18414