You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: Support reading from files that have an UTF-8 Byte Order Mark (#670)
Adds support for reading files with UTF-8 BOM. This is commonly created
by Windows text editors and should be skipped because serde
deserialization will not handle those bytes.
We have encountered this issue with our Windows customers who may create
UTF-8 BOM files without their knowledge. Although we fixed it with a
custom FileSource implementation, it would be nice to have this in the
upstream to help others who may run into this issue.
This PR came from discussion in
#565
Unlike that PR, this one handles only UTF-8 Boms, and not other
encodings, and does not pull in any new dependencies.
- Adds a test with a UTF-8 BOM text file.
- Updates FileSourceFile to skip the 3 BOM bytes if they are detected.
0 commit comments