Skip to content

What text encoding does llama_token_to_piece() return? UTF-8? #3114

Answered by goerch
crasm asked this question in Q&A
Discussion options

You must be logged in to vote

llama.cpp doesn't implement any kind of Unicode normalization, so your output depends on the normalization of your input. And I would expect llama_token_to_piece to return UTF-8, yes.

Replies: 2 comments 2 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
2 replies
@marcov-dart
Comment options

@KerfuffleV2
Comment options

Answer selected by crasm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
5 participants