Skip to content
This repository was archived by the owner on Feb 18, 2024. It is now read-only.

Conversation

@jaychia
Copy link
Contributor

@jaychia jaychia commented Aug 9, 2023

Arrow2 already has support for Parquet FixedLenByteArray -> Decimal conversion

This PR adds support for Parquet (variable-length) ByteArray -> Decimal conversion, re-using most of the logic from FixedLenByteArray conversion

@ritchie46
Copy link
Collaborator

This PR adds support for Parquet (variable-length) ByteArray

I don't understand. Why would decimal be encoded in variable length binary?

@jaychia jaychia force-pushed the jay/bytes-decimal-pr branch from 7c57f50 to 8e6d836 Compare September 5, 2023 20:54
@codecov
Copy link

codecov bot commented Sep 5, 2023

Codecov Report

Patch coverage has no change and project coverage change: -0.05% ⚠️

Comparison is base (87ab844) 83.02% compared to head (ab04856) 82.98%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1534      +/-   ##
==========================================
- Coverage   83.02%   82.98%   -0.05%     
==========================================
  Files         391      391              
  Lines       42786    42814      +28     
==========================================
+ Hits        35523    35529       +6     
- Misses       7263     7285      +22     
Files Changed Coverage Δ
src/io/parquet/read/deserialize/simple.rs 82.73% <0.00%> (-3.54%) ⬇️
src/io/parquet/read/schema/convert.rs 93.73% <0.00%> (-0.49%) ⬇️

... and 6 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jaychia
Copy link
Contributor Author

jaychia commented Sep 5, 2023

Hi @ritchie46, apologies for the late reply!

Going by the Parquet spec, decimals are actually able to be encoded as int32, int64, fixed_len_byte_array and binary.

See: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal

binary: precision is not limited, but is required. The minimum number of bytes to store the unscaled value should be used.

@ariesdevil
Copy link
Contributor

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants