Skip to content

GH-41476: [Python][C++] Impossible to specify is_adjusted_to_utc for Time type when writing to Parquet #47316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

sgilmore10
Copy link
Member

@sgilmore10 sgilmore10 commented Aug 12, 2025

Rationale for this change

As of today, it's not possible to write Parquet TIME data whose isAdjustedToUTC parameter is false. Instead, isAdjustedToUTC is hard-coded to true here.

Unfortunately, some Parquet consumers only support TIME data if the isAdjustedToUTC parameter is false, meaning they cannot import Parquet TIME data generated by our Parquet Writer. For example, the apache/spark Parquet reader only supports Parquet TIME columns if isAdjustedToUTC=false and units=MICROSECONDS.

Adding support for writing TIME data with the isAdjustedToUTC set to false would unblock users who need to write Spark-compatible Parquet data.

What changes are included in this PR?

  1. Added a write_time_adjusted_to_utc as a property to parquet::ArrowWriterProperties. If true, all TIME columns have their isAdjustedToUTC parameters set to true. Otherwise, isAdjustedToUTC is set to false for all TIME columns. This property is true by default.
  2. Added enable_write_time_adjusted_to_utc() and disable_write_time_adjusted_to_utc() methods to parquet::ArrowWriterProperties::Builder.

Are these changes tested?

Yes. I added test case ParquetTimeAdjustedToUTC to test suite TestConvertArrowSchema.

Are there any user-facing changes?

Yes. Users can now write Parquet TIME columns whose isAdjustedToUTC parameter is false.

NOTE

  1. I did not update the PyArrow interface because I am not familiar with that code base. I was planning on creating a new GitHub issue to track that work separately.
  2. There already exists an open PR (GH-41476: [C++][Python][Parquet] Add time_is_adjusted_to_utc to parquet prope… #43268) for addressing this issue. However, that PR was last active over a year ago and seems stale.

@wgtmac
Copy link
Member

wgtmac commented Aug 13, 2025

I think this is something that should be fixed on the Spark side per the discussion from the old PR that you've mentioned.

@sgilmore10
Copy link
Member Author

Hi @wgtmac,

Thanks for sharing your thoughts on this.

I agree with you that the best case scenario would be for the Apache Spark community to extend the Spark Parquet reader to support the Time type with isAdjustedToUTC=true. However, I was wondering if you could elaborate a bit more on why the community doesn't feel that extending the Arrow Parquet writer to support writing Parquet Time data with isAdjustedToUTC set to false is a good idea.

The decision to default to isAdjustedToUTC=true makes sense in light of the Parquet spec's guidelines on compatibility with respect to the deprecation of TIME_MILLIS/TIME_MICROS. However, at the same time, my impression from reading the discussion on GH-41476 is that the Arrow community would have ideally chosen to map Arrow's Time types to isAdjustedToUTC=false if compatibility wasn't a concern (because Arrow's Time types are timezone-unaware).

Given that the Parquet specification allows for writing local Time data and that Arrow's time types are timezone-unaware, my personal opinion is that adding the ability to explicitly opt-in to writing Time types with isAdjustedToUTC=false would unblock some important interoperability workflows (e.g. Spark <-> Arrow). To be very clear - what I am suggesting is NOT to change the current default behavior of Arrow's writer (i.e. we would continue writing Time(isAdjustedToUTC=true) by default, and, therefore, this proposed change would have no impact on backwards compatibility. This would be an explicit, opt-in feature.

Given the complexity of this issue, does anyone feel that it would be helpful to ask for clarification from the broader Parquet community about this? It appears others have been confused about the purpose of the isAdjustedToUTC parameter in the past.

I really appreciate hearing everyone's thoughts on this. This is definitely a nuanced issue, and I am comfortable with whatever direction the community collectively feels is most appropriate. However, in my personal opinion, this would a worthwhile change.

Thanks!

Best,
Sarah

@wgtmac
Copy link
Member

wgtmac commented Aug 15, 2025

The previous PR on this didn't make any progress because we found that this is unclear on the Parquet side: #43268 (comment). Perhaps the right direction is to remove is_adjusted_to_utc from Parquet spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants