Replies: 2 comments
-
I'll look into this, @H3L70N. Thanks for reporting this! |
Beta Was this translation helpful? Give feedback.
0 replies
-
@H3L70N just published v0.10.8 with the fix. Thanks for the insight. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Bug Report: UTF-8 Character Encoding Issue in Identity Tokenization
Summary
The identity tokenization feature in BLNK fails when processing names containing UTF-8 special characters, producing invalid byte sequences that cannot be stored in the PostgreSQL database.
Environment
server_encoding
andclient_encoding
confirmed asUTF8
Bug Description
When attempting to tokenize an identity that contains UTF-8 special characters (accented characters, umlauts, etc.), the tokenization process corrupts the character encoding, resulting in invalid UTF-8 byte sequences that cause PostgreSQL insertion failures.
Steps to Reproduce
Create an identity with a name containing UTF-8 special characters:
Attempt to tokenize the created identity:
Observe the tokenization failure with UTF-8 encoding error
Expected Behavior
Actual Behavior
pq: invalid byte sequence for encoding "UTF8"
Test Cases and Results
Error Logs
Database State Verification
Database query showing successful storage of original UTF-8 data but missing tokenization:
Note: Entries 88 and 89 have properly tokenized
last_name
fields, while entries 90 and 91 show the raw "Taizé" value, indicating tokenization failure.Root Cause Analysis
The issue appears to be in the tokenization algorithm itself, which:
The tokenization process seems to be treating UTF-8 multi-byte characters as individual single-byte characters, breaking the UTF-8 encoding specification.
Impact
Suggested Fix
The tokenization implementation needs to:
Test Recommendations
Add test cases covering:
Additional Information
Reporter: Helton Guambe @pavulla-tech
Date: July 29, 2025
Priority: High
Beta Was this translation helpful? Give feedback.
All reactions