hSeek and hTell do unnecessary stat
In base, hTell and hSeek are unnecessarily inefficient.
On Linux, they should translate to just 1 lseek() syscall, but base inserts additional newfstatat() (type of stat()) syscalls in front of each.
This can be observed in strace e.g. against ghci.
Repro
import System.IO
writeFile "myfile" "" -- just to create the file
f <- openFile "myfile" ReadMode
hTell f
Example strace output (e.g. strace -fyp "$(pidof ghc)" -P "myfile") for the hTell call:
newfstatat(12</home/niklas/myfile>, "", {st_mode=S_IFREG|0644, st_size=919, ...}, AT_EMPTY_PATH) = 0
lseek(12</home/niklas/myfile>, 0, SEEK_CUR) = 0
(Note in the above that lseek(..., SEEK_CUR) is the returns the current seek position instead of setting it, thus implementing hTell.)
Problems
These additionl stat syscalls have various problems:
- Syscall spam: Costs context switches and makes debugging harder in strace.
- Greatly increased latency on networked file systems.
Thus they make IO-based code much slower than in e.g. C or Python.
Cause
The reason for the additional stats seems to be the "Handle must be seekable" requirement.
Instead of obtaining the information whether a Handle is seekable once, and storing it, hSeek and hTell obtain this information repeatedly on each call, even though it cannot change in between:
-
hSeek/hTellcall-
wantSeekableHandlewhich onFileHandles calls-
checkSeekableHandlewhich on non-closed files calls-
IODevice.isSeekable devwhich for theinstance IODevice FDcalls-
GHC.IO.FD.isSeekablewhich calls
-
-
-
-
Solutions
I believe that devType, and seekableness, cannot change during the lifetime of an open FD or Handle.
So the easiest solution would be to store an isSeekable :: Bool field in either data FD next to similar fields
data FD = FD {
fdFD :: {-# UNPACK #-} !CInt,
fdIsSocket_ :: {-# UNPACK #-} !Int
fdIsNonBlocking :: {-# UNPACK #-} !Int
}
or in data Handle__.