Commit 85a295d5 authored by Ben Gamari's avatar Ben Gamari Committed by Ben Gamari
Browse files

ghc-prim: Don't allocate a thunk for each unpacked UTF-8 character

While debugging #14005 I noticed that unpackCStringUtf8# was allocating
a thunk for each Unicode character that it unpacked. This seems hardly
worthwhile given that the thunk's closure will be at least three words,
whereas the Char itself will be only two and requires only a bit of bit
twiddling to construct.

Test Plan: Validate

Reviewers: simonmar, austin

Subscribers: dfeuer, rwbarton, thomie

Differential Revision: https://phabricator.haskell.org/D3769
parent 897366a0
......@@ -125,24 +125,28 @@ unpackCStringUtf8# :: Addr# -> [Char]
unpackCStringUtf8# addr
= unpack 0#
where
-- We take care to strictly evaluate the character decoding as
-- indexCharOffAddr# is marked with the can_fail flag and
-- consequently GHC won't evaluate the expression unless it is absolutely
-- needed.
unpack nh
| isTrue# (ch `eqChar#` '\0'# ) = []
| isTrue# (ch `leChar#` '\x7F'#) = C# ch : unpack (nh +# 1#)
| isTrue# (ch `leChar#` '\xDF'#) =
C# (chr# (((ord# ch -# 0xC0#) `uncheckedIShiftL#` 6#) +#
(ord# (indexCharOffAddr# addr (nh +# 1#)) -# 0x80#))) :
unpack (nh +# 2#)
let !c = C# (chr# (((ord# ch -# 0xC0#) `uncheckedIShiftL#` 6#) +#
(ord# (indexCharOffAddr# addr (nh +# 1#)) -# 0x80#)))
in c : unpack (nh +# 2#)
| isTrue# (ch `leChar#` '\xEF'#) =
C# (chr# (((ord# ch -# 0xE0#) `uncheckedIShiftL#` 12#) +#
((ord# (indexCharOffAddr# addr (nh +# 1#)) -# 0x80#) `uncheckedIShiftL#` 6#) +#
(ord# (indexCharOffAddr# addr (nh +# 2#)) -# 0x80#))) :
unpack (nh +# 3#)
let !c = C# (chr# (((ord# ch -# 0xE0#) `uncheckedIShiftL#` 12#) +#
((ord# (indexCharOffAddr# addr (nh +# 1#)) -# 0x80#) `uncheckedIShiftL#` 6#) +#
(ord# (indexCharOffAddr# addr (nh +# 2#)) -# 0x80#)))
in c : unpack (nh +# 3#)
| True =
C# (chr# (((ord# ch -# 0xF0#) `uncheckedIShiftL#` 18#) +#
((ord# (indexCharOffAddr# addr (nh +# 1#)) -# 0x80#) `uncheckedIShiftL#` 12#) +#
((ord# (indexCharOffAddr# addr (nh +# 2#)) -# 0x80#) `uncheckedIShiftL#` 6#) +#
(ord# (indexCharOffAddr# addr (nh +# 3#)) -# 0x80#))) :
unpack (nh +# 4#)
let !c = C# (chr# (((ord# ch -# 0xF0#) `uncheckedIShiftL#` 18#) +#
((ord# (indexCharOffAddr# addr (nh +# 1#)) -# 0x80#) `uncheckedIShiftL#` 12#) +#
((ord# (indexCharOffAddr# addr (nh +# 2#)) -# 0x80#) `uncheckedIShiftL#` 6#) +#
(ord# (indexCharOffAddr# addr (nh +# 3#)) -# 0x80#)))
in c : unpack (nh +# 4#)
where
!ch = indexCharOffAddr# addr nh
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment