Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
GHC
GHC
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 4,386
    • Issues 4,386
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
    • Iterations
  • Merge Requests 373
    • Merge Requests 373
  • Requirements
    • Requirements
    • List
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Operations
    • Operations
    • Incidents
    • Environments
  • Analytics
    • Analytics
    • CI / CD
    • Code Review
    • Insights
    • Issue
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Glasgow Haskell Compiler
  • GHCGHC
  • Issues
  • #19135

Closed
Open
Opened Dec 30, 2020 by Philippe@flip101Reporter

add "true group by" function in Data.List

Motivation

It is about the groupBy function in Data.List https://hackage.haskell.org/package/base-4.14.1.0/docs/Data-List.html#v:groupBy

In my own words: this function puts things in groups as long as they are the same, and creates a new group when a difference occurs.

This is not the same as grouping all things together which are equal (as by predicate in first argument). This is described in the documentation of https://hackage.haskell.org/package/base-4.14.1.0/docs/Data-List.html#v:group

Where the example yields: ["M","i","ss","i","ss","i","pp","i"] instead of "true grouping" ["M","iiii","ssss","pp"].

Always when i used groupBy i had to sort first to solve this problem. But some data it's not easy to sort/hash but easy to compare in which case the groupBy function can not be used for "true grouping", thus there is no grouping function available in this use case.

I think the "true group by" is more frequently required than "groupBy" as it exists now. This is just from my experience.

For new users it's confusing when "groupBy does not group". Already had multiple bugs because of this.

Proposal

This feature request has two parts which can be accepted / rejected individually.

  1. Add documentation to groupBy function that more explicitly explains the behavior of "group until difference occurs" (probably need better wording). Also add big O notation for this function.
  2. Add function for "true group by" (needs better name)
trueGroupBy :: (a -> a -> Bool) -> [a] -> [[a]]
trueGroupBy f []     = []
trueGroupBy f (x:xs) = let (yes, no) = partition (f x) xs
                       in (x : yes) : trueGroupBy f no

This function is O(n^2) and should point in documentation to consider usage of groupBy instead since that one should be faster.

Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
Reference: ghc/ghc#19135