-
Notifications
You must be signed in to change notification settings - Fork 1.5k
feat: mapping sql Char/Text/String default to Utf8View #16290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
14bd404
4973bf9
85bf391
406575b
bcb5645
8ad6491
0b4ee41
8ee72c9
0b05cb6
e4560a1
d825fd1
f341e1e
6485228
fd20268
a715d10
15aba27
9802850
1187834
69838fd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -61,22 +61,12 @@ LOCATION '../core/tests/data/partitioned_table_arrow/' | |
PARTITIONED BY (part); | ||
|
||
# select wildcard | ||
query ITBI | ||
query error DataFusion error: Arrow error: External error: Arrow error: Invalid argument error: column types must match schema types, expected Utf8View but found Utf8 at column index 1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cc @xudong963 @alamb , this is the only remaining issue, do you know why Arrow format do not support Utf8 with Utf8View? Thanks! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This looks to me like there is a mismatch between what is declared in the plan and what the actual types are. Are you able to figure out what the stack trace is that throws this error? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you @alamb , after investigate i believe i found the root cause: We using fixed arrow file to test for sqllogictests, and this arrow field is writing with arrow-ipc utf8, so when we decode to read it's also loading utf8. But we default the field for sql to mapping to utf8view for this PR, so when we create the record batch we add checking default, it will failed. |
||
SELECT * FROM arrow_partitioned ORDER BY f0; | ||
---- | ||
1 foo true 123 | ||
2 bar false 123 | ||
3 baz true 456 | ||
4 NULL NULL 456 | ||
|
||
# select all fields | ||
query IITB | ||
query error DataFusion error: Arrow error: External error: Arrow error: Invalid argument error: column types must match schema types, expected Utf8View but found Utf8 at column index 1 | ||
SELECT part, f0, f1, f2 FROM arrow_partitioned ORDER BY f0; | ||
---- | ||
123 1 foo true | ||
123 2 bar false | ||
456 3 baz true | ||
456 4 NULL NULL | ||
|
||
# select without partition column | ||
query IB | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 this looks like it may have added extra casts I wonder if it because
md5
doesn't supportStringView
natively 🤔There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thank you @alamb , i support it now in latest PR.