Reading Parquet files in Spark Iceberg with TIMESTAMP_MILLIS parquet type causes ClassCastException

### Apache Iceberg version

1.6.1

### Query engine

Spark

### Please describe the bug 🐞

## Issue Summary

- Spark version 3.5.5
- `Iceberg-spark-runtime-3.5_2.12` version 1.6.1

I'm getting an error when reading an iceberg table with parquet files containing Timestamp fields backed by Parquet TIMESTAMP_MILLIS type. Error I'm getting is:

```java.lang.ClassCastException: class org.apache.iceberg.shaded.org.apache.arrow.vector.TimeStampMicroTZVector cannot be cast to class org.apache.iceberg.shaded.org.apache.arrow.vector.BigIntVector (org.apache.iceberg.shaded.org.apache.arrow.vector.TimeStampMicroTZVector and org.apache.iceberg.shaded.org.apache.arrow.vector.BigIntVector are in unnamed module of loader 'app')```

Root cause seems to be that `Iceberg` is expecting a `BigIntVector` in its [vectorized arrow reader](https://github.com/apache/iceberg/blob/apache-iceberg-1.6.1/arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java#L297) but the actual columnVector created is of type `TimeStampMicroTZVector`.

The columnVector created (via `https://github.com/apache/arrow-java/blob/main/vector/src/main/java/org/apache/arrow/vector/types/pojo/FieldType.java#L107`) inherits the arrowType which is defined by Iceberg itself via [ArrowSchemaUtil](https://github.com/apache/iceberg/blob/apache-iceberg-1.6.1/arrow/src/main/java/org/apache/iceberg/arrow/ArrowSchemaUtil.java#L103) to always have microseconds precision. Hence the columnVector will always have TimeStampMicroTZVector type. 

When underlying parquet file has `TIMESTAMP_MICROS` data type, it takes [this path](https://github.com/apache/iceberg/blob/apache-iceberg-1.8.1/arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java#L301) instead which properly casts to `TimeStampMicroTZVector` (or NTZ depending on IB metadata)

Same read path, when vectorization is turned off via below config, has no errors

```spark.sql.iceberg.vectorization.enabled=false```

I don't see this logic changing in the latest version of Iceberg so this issue may still exist in latest versions. Why are we expecting a Long type vector in the vectorized reader? 

## Repro

There was a similar issue that reproduced above ClassCastException: https://github.com/apache/iceberg/issues/14046 

In general if we generate a Parquet dataset in Spark with timestamp filed of `ms` precision and add the parquet to an Iceberg table, then read in the table via spark the above error would surface. 


### Willingness to contribute

- [ ] I can contribute a fix for this bug independently
- [ ] I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reading Parquet files in Spark Iceberg with TIMESTAMP_MILLIS parquet type causes ClassCastException #14430

Apache Iceberg version

Query engine

Please describe the bug 🐞

Issue Summary

Repro

Willingness to contribute

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reading Parquet files in Spark Iceberg with TIMESTAMP_MILLIS parquet type causes ClassCastException #14430

Description

Apache Iceberg version

Query engine

Please describe the bug 🐞

Issue Summary

Repro

Willingness to contribute

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions