Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • GHC GHC
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 4,824
    • Issues 4,824
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
  • Merge requests 448
    • Merge requests 448
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Releases
  • Analytics
    • Analytics
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
    • Value stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Glasgow Haskell Compiler
  • GHCGHC
  • Issues
  • #9577

Closed
Open
Created Sep 11, 2014 by xnyhps@trac-xnyhps

String literals are wasting space

For D199 I looked into how string literals are compiled down by GHC.

On 64-bit OS X, a simple string "AAA" turns into assembly:

.const
.align 3
.align 0
c38E_str:
	.byte	65
	.byte	65
	.byte	65
	.byte	0

(And also something that invokes unpackCString#, but that isn't relevant here.)

(MkCore.mkStringExprFS -> CmmUtils.mkByteStringCLit -> compiler/nativeGen/X86/Ppr.pprSectionHeader.)

Note how this:

  • Is 8 byte aligned.
  • Is a .const section.

I can't find any reason why string literals would need to be 8-byte aligned on OS X. There might be a small benefit in performance to read data starting 8-byte aligned, but I doubt doing that for string literals would be a meaningful difference. Assembly from both clang and gcc does not align string literals.

The trivial program:

main :: IO ()
main = return ()

has almost 5kB of wasted space of padding between all strings the Prelude brings in, built with GHC HEAD.

The fact that it is a .const section, instead of .cstring (https://developer.apple.com/library/mac/documentation/DeveloperTools/Reference/Assembler/040-Assembler_Directives/asm_directives.html\#//apple_ref/doc/uid/TP30000823-TPXREF127) means duplicate strings aren't shared by the assembler. GHC floats out string literals to the top-level and uses CSE to eliminate duplicates, but that only works in a single modules. Strings shared between different modules end up as duplicate strings in an executable.

The same program as above also has ~4kB of wasted space due to duplicate Prelude strings ("base" occurs 16 times!). Compared to the total binary size (4MB after stripping), removing this redundant data wouldn't be a big improvement (0.2%), but I still think it can be a worthwile optimization.

I think this can be solved quite easily by creating a new section header for literal strings, which is unaligned and of type .cstring.

Trac metadata
Trac field Value
Version 7.8.2
Type Bug
TypeOfFailure OtherFailure
Priority low
Resolution Unresolved
Component Compiler (NCG)
Test case
Differential revisions
BlockedBy
Related
Blocking
CC simonmar
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking