- 
                Notifications
    You must be signed in to change notification settings 
- Fork 2.9k
Description
Apache Iceberg version
1.6.1
Query engine
Spark
Please describe the bug 🐞
Issue Summary
- Spark version 3.5.5
- Iceberg-spark-runtime-3.5_2.12version 1.6.1
I'm getting an error when reading an iceberg table with parquet files containing Timestamp fields backed by Parquet TIMESTAMP_MILLIS type. Error I'm getting is:
java.lang.ClassCastException: class org.apache.iceberg.shaded.org.apache.arrow.vector.TimeStampMicroTZVector cannot be cast to class org.apache.iceberg.shaded.org.apache.arrow.vector.BigIntVector (org.apache.iceberg.shaded.org.apache.arrow.vector.TimeStampMicroTZVector and org.apache.iceberg.shaded.org.apache.arrow.vector.BigIntVector are in unnamed module of loader 'app')
Root cause seems to be that Iceberg is expecting a BigIntVector in its vectorized arrow reader but the actual columnVector created is of type TimeStampMicroTZVector.
The columnVector created (via https://github.com/apache/arrow-java/blob/main/vector/src/main/java/org/apache/arrow/vector/types/pojo/FieldType.java#L107) inherits the arrowType which is defined by Iceberg itself via ArrowSchemaUtil to always have microseconds precision. Hence the columnVector will always have TimeStampMicroTZVector type.
When underlying parquet file has TIMESTAMP_MICROS data type, it takes this path instead which properly casts to TimeStampMicroTZVector (or NTZ depending on IB metadata)
Same read path, when vectorization is turned off via below config, has no errors
spark.sql.iceberg.vectorization.enabled=false
I don't see this logic changing in the latest version of Iceberg so this issue may still exist in latest versions. Why are we expecting a Long type vector in the vectorized reader?
Repro
There was a similar issue that reproduced above ClassCastException: #14046
In general if we generate a Parquet dataset in Spark with timestamp filed of ms precision and add the parquet to an Iceberg table, then read in the table via spark the above error would surface.
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time