Skip to content

hSeek and hTell do unnecessary stat

In base, hTell and hSeek are unnecessarily inefficient.

On Linux, they should translate to just 1 lseek() syscall, but base inserts additional newfstatat() (type of stat()) syscalls in front of each.

This can be observed in strace e.g. against ghci.

Repro

import System.IO
writeFile "myfile" "" -- just to create the file

f <- openFile "myfile" ReadMode
hTell f

Example strace output (e.g. strace -fyp "$(pidof ghc)" -P "myfile") for the hTell call:

newfstatat(12</home/niklas/myfile>, "", {st_mode=S_IFREG|0644, st_size=919, ...}, AT_EMPTY_PATH) = 0
lseek(12</home/niklas/myfile>, 0, SEEK_CUR) = 0

(Note in the above that lseek(..., SEEK_CUR) is the returns the current seek position instead of setting it, thus implementing hTell.)

Problems

These additionl stat syscalls have various problems:

  • Syscall spam: Costs context switches and makes debugging harder in strace.
  • Greatly increased latency on networked file systems.

Thus they make IO-based code much slower than in e.g. C or Python.

Cause

The reason for the additional stats seems to be the "Handle must be seekable" requirement.

Instead of obtaining the information whether a Handle is seekable once, and storing it, hSeek and hTell obtain this information repeatedly on each call, even though it cannot change in between:

Solutions

I believe that devType, and seekableness, cannot change during the lifetime of an open FD or Handle.

So the easiest solution would be to store an isSeekable :: Bool field in either data FD next to similar fields

data FD = FD {
  fdFD :: {-# UNPACK #-} !CInt,
  fdIsSocket_ :: {-# UNPACK #-} !Int
  fdIsNonBlocking :: {-# UNPACK #-} !Int
 }

or in data Handle__.

Edited by Niklas Hambüchen
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information