-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-52693][SQL] Support +/- ANSI day-time intervals to/from TIME #51383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@srielau Could you take a look at the PR, please. |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/timeExpressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/timeExpressions.scala
Outdated
Show resolved
Hide resolved
* Valid values: 0 (DAY), 1 (HOUR), 2 (MINUTE), 3 (SECOND). | ||
* @param targetPrecision The number of digits of the fraction part of the resulting time. | ||
* @return A time value in nanoseconds or throw an arithmetic overflow | ||
* if the result out of valid time range [00:00, 24:00). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if the day-time interval has the day field, it will always overflow? Shall we check the start field of day-time internal at the analysis time to make sure it's not DAY?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if the day-time interval has the day field, it will always overflow?
Not always. Values in the day field can be 0. In that case, it couldn't overflow.
Shall we check the start field of day-time internal at the analysis time to make sure it's not DAY?
Maybe, do you mean the end field? And prohibit DayTimeIntervalType(DAY, DAY)
? Even with that type, users might construct valid expressions like TIME'12:30' + INTERVAL '0' DAY
. The SQL standard says nothing about that case (@srielau Am I right?). @dongjoon-hyun @yaooqinn @cloud-fan WDYT, should we disallow such day intervals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I see it, DATETIME_OVERFLOW
is simple and sufficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let's keep it runtime error then
@@ -181,7 +181,7 @@ private[spark] object AnsiIntervalType extends AbstractDataType { | |||
*/ | |||
private[sql] abstract class AnyTimeType extends DatetimeType | |||
|
|||
private[spark] object AnyTimeType extends AbstractDataType { | |||
private[spark] object AnyTimeType extends AbstractDataType with Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question. Is this Serializable
required for this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, otherwise StaticInvoke
fails with:
java.io.NotSerializableException: org.apache.spark.sql.types.AnyTimeType$
Serialization stack:
- object not serializable (class: org.apache.spark.sql.types.AnyTimeType$, value: org.apache.spark.sql.types.AnyTimeType$@1929425f)
- writeObject data (class: scala.collection.generic.DefaultSerializationProxy)
- object (class scala.collection.generic.DefaultSerializationProxy, scala.collection.generic.DefaultSerializationProxy@29d37757)
- writeReplace data (class: scala.collection.generic.DefaultSerializationProxy)
- object (class scala.collection.immutable.$colon$colon, List(org.apache.spark.sql.types.AnyTimeType$@1929425f, IntegerType, DayTimeIntervalType, ByteType, IntegerType))
- field (class: org.apache.spark.sql.catalyst.expressions.objects.StaticInvoke, name: inputTypes, type: interface scala.collection.immutable.Seq)
- object (class org.apache.spark.sql.catalyst.expressions.objects.StaticInvoke, static_invoke(DateTimeUtils.timeAddInterval(null, 6, INTERVAL '0 01:00:00' DAY TO SECOND, 3, 6)))
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:43)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:50)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:122)
Merging to master. Thank you, @yaooqinn @cloud-fan @dongjoon-hyun for review. |
### What changes were proposed in this pull request? In the PR, I propose to support the `+` and `+` operators over TIME and DAY-TIME INTERVAL. #### Syntax ``` exprA + exprB, exprB + exprA exprA - exprB ``` where - **exprA** - an expression of the TIME data type of any valid precision [0, 6]. - **exprB** - and expression of the DAY-TIME INTERVAL with any start and end fields `SECOND`, `MINUTE`, `HOUR`, `DAY`. #### Returns The result of the TIME(n) data type or raises the error `DATETIME_OVERFLOW` if the result is out of the valid range `[00:00, 24:00)`. If the result is valid, its precision `n` is the maximum precision of the input time `m` and the day-time interval `i`: `n = max(m, i)` where `i` = 6 for the end interval field `SECOND` and `0` for other fields `MINUTE`, `HOUR`, `DAY`. ### Why are the changes needed? To conform the ANSI SQL standard: <img width="867" alt="Screenshot 2025-07-07 at 09 41 49" src="https://github.com/user-attachments/assets/808a3bad-70a6-4c28-b23d-83e8399bd0e9" /> ### Does this PR introduce _any_ user-facing change? No. The TIME data type hasn't been released yet. ### How was this patch tested? By running new tests and affected test suites: ``` $ build/sbt "test:testOnly *DateTimeUtilsSuite" $ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z time.sql" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#51383 from MaxGekk/time-add-interval. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
### What changes were proposed in this pull request? In the PR, I propose to support the `+` and `+` operators over TIME and DAY-TIME INTERVAL. #### Syntax ``` exprA + exprB, exprB + exprA exprA - exprB ``` where - **exprA** - an expression of the TIME data type of any valid precision [0, 6]. - **exprB** - and expression of the DAY-TIME INTERVAL with any start and end fields `SECOND`, `MINUTE`, `HOUR`, `DAY`. #### Returns The result of the TIME(n) data type or raises the error `DATETIME_OVERFLOW` if the result is out of the valid range `[00:00, 24:00)`. If the result is valid, its precision `n` is the maximum precision of the input time `m` and the day-time interval `i`: `n = max(m, i)` where `i` = 6 for the end interval field `SECOND` and `0` for other fields `MINUTE`, `HOUR`, `DAY`. ### Why are the changes needed? To conform the ANSI SQL standard: <img width="867" alt="Screenshot 2025-07-07 at 09 41 49" src="https://github.com/user-attachments/assets/808a3bad-70a6-4c28-b23d-83e8399bd0e9" /> ### Does this PR introduce _any_ user-facing change? No. The TIME data type hasn't been released yet. ### How was this patch tested? By running new tests and affected test suites: ``` $ build/sbt "test:testOnly *DateTimeUtilsSuite" $ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z time.sql" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#51383 from MaxGekk/time-add-interval. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
### What changes were proposed in this pull request? In the PR, I propose to support the `+` and `+` operators over TIME and DAY-TIME INTERVAL. #### Syntax ``` exprA + exprB, exprB + exprA exprA - exprB ``` where - **exprA** - an expression of the TIME data type of any valid precision [0, 6]. - **exprB** - and expression of the DAY-TIME INTERVAL with any start and end fields `SECOND`, `MINUTE`, `HOUR`, `DAY`. #### Returns The result of the TIME(n) data type or raises the error `DATETIME_OVERFLOW` if the result is out of the valid range `[00:00, 24:00)`. If the result is valid, its precision `n` is the maximum precision of the input time `m` and the day-time interval `i`: `n = max(m, i)` where `i` = 6 for the end interval field `SECOND` and `0` for other fields `MINUTE`, `HOUR`, `DAY`. ### Why are the changes needed? To conform the ANSI SQL standard: <img width="867" alt="Screenshot 2025-07-07 at 09 41 49" src="https://github.com/user-attachments/assets/808a3bad-70a6-4c28-b23d-83e8399bd0e9" /> ### Does this PR introduce _any_ user-facing change? No. The TIME data type hasn't been released yet. ### How was this patch tested? By running new tests and affected test suites: ``` $ build/sbt "test:testOnly *DateTimeUtilsSuite" $ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z time.sql" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#51383 from MaxGekk/time-add-interval. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
What changes were proposed in this pull request?
In the PR, I propose to support the
+
and-
operators over TIME and DAY-TIME INTERVAL.Syntax
where
SECOND
,MINUTE
,HOUR
,DAY
.Returns
The result of the TIME(n) data type or raises the error
DATETIME_OVERFLOW
if the result is out of the valid range[00:00, 24:00)
. If the result is valid, its precisionn
is the maximum precision of the input timem
and the day-time intervali
:n = max(m, i)
wherei
= 6 for the end interval fieldSECOND
and0
for other fieldsMINUTE
,HOUR
,DAY
.Why are the changes needed?
To conform the ANSI SQL standard:

Does this PR introduce any user-facing change?
No. The TIME data type hasn't been released yet.
How was this patch tested?
By running new tests and affected test suites:
Was this patch authored or co-authored using generative AI tooling?
No.