- Description: A small library for doing UTF-8-based input and output.
- Licence: ZLIB
- Author: Marijn Haverbeke marijnh@gmail.com
- Maintainer: Gábor Melis mega@retes.hu
- Homepage: https://common-lisp.net/project/trivial-utf-8/
- Bug tracker: https://gitlab.common-lisp.net/trivial-utf-8/trivial-utf-8/-/issues
- Source control: GIT
Trivial UTF-8 is a small library for doing UTF-8-based in- and
output on a Lisp implementation that already supports Unicode -
meaning CHAR-CODE
and CODE-CHAR
deal with Unicode character codes.
The rationale for the existence of this library is that while Unicode-enabled implementations usually do provide some kind of interface to dealing with character encodings, these are typically not terribly flexible or uniform.
The Babel library solves a similar problem while understanding more encodings. Trivial UTF-8 was written before Babel existed, but for new projects you might be better off going with Babel. The one plus that Trivial UTF-8 has is that it doesn't depend on any other libraries.
Here is the official repository and the HTML documentation for the latest version.
-
[function] UTF-8-BYTE-LENGTH STRING
Calculate the amount of bytes needed to encode
STRING
.
-
[function] STRING-TO-UTF-8-BYTES STRING &KEY NULL-TERMINATE
Convert
STRING
into an array of unsigned bytes containing its UTF-8 representation. IfNULL-TERMINATE
, add an extra 0 byte at the end.
-
[function] UTF-8-GROUP-SIZE BYTE
Determine the amount of bytes that are part of the character whose encoding starts with
BYTE
. May signalUTF-8-DECODING-ERROR
.
-
[function] UTF-8-BYTES-TO-STRING BYTES &KEY (START 0) (END (LENGTH BYTES))
Convert the
START
,END
subsequence of the array ofBYTES
containing UTF-8 encoded characters to aSTRING
. The element type ofBYTES
may be anything as long as it can beCOERCE
d into an(UNSIGNED-BYTES 8)
array. May signalUTF-8-DECODING-ERROR
.
-
[function] READ-UTF-8-STRING INPUT &KEY NULL-TERMINATED STOP-AT-EOF (CHAR-LENGTH -1) (BYTE-LENGTH -1)
Read UTF-8 encoded data from
INPUT
, a byte stream, and construct a string with the characters found. WhenNULL-TERMINATED
is given, stop reading at a null character. IfSTOP-AT-EOF
, then stop atEND-OF-FILE
without raising an error. TheCHAR-LENGTH
andBYTE-LENGTH
parameters can be used to specify the max amount of characters or bytes to read, where -1 means no limit. May signalUTF-8-DECODING-ERROR
.
- [condition] UTF-8-DECODING-ERROR SIMPLE-ERROR