Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
GHC
GHC
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 4,332
    • Issues 4,332
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
    • Iterations
  • Merge Requests 363
    • Merge Requests 363
  • Requirements
    • Requirements
    • List
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Security & Compliance
    • Security & Compliance
    • Dependency List
    • License Compliance
  • Operations
    • Operations
    • Incidents
    • Environments
  • Analytics
    • Analytics
    • CI / CD
    • Code Review
    • Insights
    • Issue
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • Glasgow Haskell Compiler
  • GHCGHC
  • Issues
  • #3571

Closed
Open
Opened Oct 09, 2009 by guest@trac-guest

Bizzarely bloated binaries

Compiling a trivial test program:

module Main where

main = print "Hello World"

Using GHC 6.10.4 produces a VERY suspicious PE file. (NB: this applies to DLL as well as EXE output).

The two problems that I have observed are:

  1. The PE always contains a .stab and .stabstr section totalling 0x2A00 of debug data. Looking at the contents of stabstr, this appears to originate from a libffi object file. Perhaps we could disable stabs when building libffi to remove this bloat from output binaries.

  2. The PE contains *A LOT* of trailing junk. My hello world program is 691K, and the PE contains 0x4FAFC = 318K of data which doesn't live in any section! Trimming this data appears to have no effect on the correctness of the program! The amount of junk grows proportionally to the amount of real code and data - I have observed e.g. 18Mb DLLs of which 9Mb are trailing junk.

To repeat: we could potentially *halve* GHC binary sizes by fixing this linker behaviour.

I'm not sure where exactly the fault lies - whether it is a GHC problem or some bug in Ld.

To test trimming your executables and DLLs, you can use this utility I whipped up. Usage is "trimpe <file1> ... <fileN>". It will trim useless data from the end of the files in place:

{-# LANGUAGE ScopedTypeVariables #-}
module Main (main) where

import Control.Monad

import Data.Binary
import Data.Binary.Get

import qualified Data.ByteString.Lazy as ByteString
import Data.Word

import System.Environment

import Debug.Trace


assertM :: Monad m => Bool -> m ()
assertM True  = return ()
assertM False = fail "assertM"

newtype PEImageLength = PEImageLength Word32

-- http://www.microsoft.com/whdc/system/platform/firmware/PECOFF.mspx
instance Binary PEImageLength where
    get = do
        -- Skip the MS DOS stub
        skip 0x3c
        pe_sig_offset <- getWord32le
        -- Skip to the PE signature
        skip (fromIntegral pe_sig_offset - (0x4 + 0x3c))
        -- Read the PE signature itself
        -- NB: this will always be the string "PE\0\0"
        _sig <- getWord32le
        assertM (_sig == 0x00004550)
        -- Read COFF file header
        _machine <- getWord16le
        _no_of_sects <- getWord16le
        _time_date_stamp <- getWord32le
        _ptr_to_sym_tab <- getWord32le
        _no_of_syms <- getWord32le
        _size_of_opt_header <- getWord16le
        assertM (_size_of_opt_header /= 0)
        _characteristics <- getWord16le
        -- Read the "optional" header
        magic <- getWord16le
        let pe32plus = magic == 0x20B
        _maj_link_ver :: Word8 <- get
        _min_link_ver :: Word8 <- get
        _size_of_code <- getWord32le
        _size_of_init_data <- getWord32le
        _size_of_uninit_data <- getWord32le
        _addr_of_entry_point <- getWord32le
        _base_of_code <- getWord32le
        when (not pe32plus) $ do _base_of_data <- getWord32le; return ()
        -- Read the optional header Windows fields
        if pe32plus
         then do _image_base <- getWord64le; return ()
         else do _image_base <- getWord32le; return ()
        _sect_alignment <- getWord32le
        _file_alignment <- getWord32le
        _maj_os_version <- getWord16le
        _min_os_version <- getWord16le
        _maj_image_version <- getWord16le
        _min_image_version <- getWord16le
        _maj_subsys_version <- getWord16le
        _min_subsys_version <- getWord16le
        _win32_version <- getWord32le
        size_of_image <- getWord32le
        -- There is more stuff later, but I simply don't care about it
        -- NB: we could trim a little more agressively if we interpreted
        -- the sections as well...
        return $ PEImageLength size_of_image
        
    put = error "Binary PEImageLength: put"


main :: IO ()
main = do
    files <- getArgs
    forM_ files trimPEToImageSize

trimPEToImageSize :: FilePath -> IO ()
trimPEToImageSize file = do
    putStrLn $ file
    pe_contents <- ByteString.readFile file
    let PEImageLength image_size = decode pe_contents
    
    -- Force the file to close so that the write may succeed
    (ByteString.last pe_contents) `seq` return ()
    
    when (ByteString.length pe_contents > fromIntegral image_size) $ do
        putStrLn $ "* Trimming to image size (" ++ show image_size ++ ")"
        let pe_contents' = ByteString.take (fromIntegral image_size) pe_contents
        ByteString.writeFile file pe_contents'
Trac metadata
Trac field Value
Version 6.10.4
Type Bug
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC batterseapower@hotmail.com
Operating system
Architecture
Assignee
Assign to
8.0.1
Milestone
8.0.1 (Past due)
Assign milestone
Time tracking
None
Due date
None
Reference: ghc/ghc#3571