Replies: 1 comment 1 reply
-
After some more thought - stringified JSON will often be 2x shorter as UTF-8 rather than UTF-16 - much of content is ASCII but there will often be a few non-ASCII characters which would push it from encoding 0 to 2. Thus leaning towards MakeCode-like solution, possibly without skip lists yet. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
EcmaScript spec requires
String.charCodeAt()
to return a 16-bit value. Unicode code points outside of 16-bit (mostly emoticons, but also some historical alphabets, and rare Chinese/Japanese/Korean ideograms) are represented as surrogate pairs of 2 16-bit code units. TheString.length
returns the number of UTF-16 code units in the string.If ES was designed today they would probably return up to 21-bit values from
charCodeAt()
, or possibly use yet another abstraction since even with full 21-bit Unicode, several code points can still combine into a single glyph (character displayed on the screen).Here are some string representations:
MakeCode uses 0, 1 and 3 (the surrogate pairs are encoded in UTF-8, so it is ES-compatible). The encoding 0 is limited to ASCII (0-127) and so all strings are valid UTF-8.
Beta Was this translation helpful? Give feedback.
All reactions