Skip to content

[SPARK-52693][SQL] Support +/- ANSI day-time intervals to/from TIME #51383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jul 6, 2025

What changes were proposed in this pull request?

In the PR, I propose to support the + and - operators over TIME and DAY-TIME INTERVAL.

Syntax

exprA + exprB, exprB + exprA
exprA - exprB

where

  • exprA - an expression of the TIME data type of any valid precision [0, 6].
  • exprB - and expression of the DAY-TIME INTERVAL with any start and end fields SECOND, MINUTE, HOUR, DAY.

Returns

The result of the TIME(n) data type or raises the error DATETIME_OVERFLOW if the result is out of the valid range [00:00, 24:00). If the result is valid, its precision n is the maximum precision of the input time m and the day-time interval i: n = max(m, i) where i = 6 for the end interval field SECOND and 0 for other fields MINUTE, HOUR, DAY.

Why are the changes needed?

To conform the ANSI SQL standard:
Screenshot 2025-07-07 at 09 41 49

Does this PR introduce any user-facing change?

No. The TIME data type hasn't been released yet.

How was this patch tested?

By running new tests and affected test suites:

$ build/sbt "test:testOnly *DateTimeUtilsSuite"
$ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z time.sql"

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Jul 6, 2025
@MaxGekk MaxGekk changed the title [WIP][SPARK-52693][SQL] Support +/- ANSI day-time intervals to/from TIME [SPARK-52693][SQL] Support +/- ANSI day-time intervals to/from TIME Jul 7, 2025
@MaxGekk MaxGekk marked this pull request as ready for review July 7, 2025 08:05
@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 7, 2025

@srielau Could you take a look at the PR, please.

* Valid values: 0 (DAY), 1 (HOUR), 2 (MINUTE), 3 (SECOND).
* @param targetPrecision The number of digits of the fraction part of the resulting time.
* @return A time value in nanoseconds or throw an arithmetic overflow
* if the result out of valid time range [00:00, 24:00).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if the day-time interval has the day field, it will always overflow? Shall we check the start field of day-time internal at the analysis time to make sure it's not DAY?

Copy link
Member Author

@MaxGekk MaxGekk Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if the day-time interval has the day field, it will always overflow?

Not always. Values in the day field can be 0. In that case, it couldn't overflow.

Shall we check the start field of day-time internal at the analysis time to make sure it's not DAY?

Maybe, do you mean the end field? And prohibit DayTimeIntervalType(DAY, DAY)? Even with that type, users might construct valid expressions like TIME'12:30' + INTERVAL '0' DAY. The SQL standard says nothing about that case (@srielau Am I right?). @dongjoon-hyun @yaooqinn @cloud-fan WDYT, should we disallow such day intervals?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I see it, DATETIME_OVERFLOW is simple and sufficient

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's keep it runtime error then

@@ -181,7 +181,7 @@ private[spark] object AnsiIntervalType extends AbstractDataType {
*/
private[sql] abstract class AnyTimeType extends DatetimeType

private[spark] object AnyTimeType extends AbstractDataType {
private[spark] object AnyTimeType extends AbstractDataType with Serializable {
Copy link
Member

@dongjoon-hyun dongjoon-hyun Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question. Is this Serializable required for this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, otherwise StaticInvoke fails with:

java.io.NotSerializableException: org.apache.spark.sql.types.AnyTimeType$
Serialization stack:
	- object not serializable (class: org.apache.spark.sql.types.AnyTimeType$, value: org.apache.spark.sql.types.AnyTimeType$@1929425f)
	- writeObject data (class: scala.collection.generic.DefaultSerializationProxy)
	- object (class scala.collection.generic.DefaultSerializationProxy, scala.collection.generic.DefaultSerializationProxy@29d37757)
	- writeReplace data (class: scala.collection.generic.DefaultSerializationProxy)
	- object (class scala.collection.immutable.$colon$colon, List(org.apache.spark.sql.types.AnyTimeType$@1929425f, IntegerType, DayTimeIntervalType, ByteType, IntegerType))
	- field (class: org.apache.spark.sql.catalyst.expressions.objects.StaticInvoke, name: inputTypes, type: interface scala.collection.immutable.Seq)
	- object (class org.apache.spark.sql.catalyst.expressions.objects.StaticInvoke, static_invoke(DateTimeUtils.timeAddInterval(null, 6, INTERVAL '0 01:00:00' DAY TO SECOND, 3, 6)))
	at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:43)
	at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:50)
	at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:122)

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 8, 2025

Merging to master. Thank you, @yaooqinn @cloud-fan @dongjoon-hyun for review.

@MaxGekk MaxGekk closed this in 62becf4 Jul 8, 2025
asl3 pushed a commit to asl3/spark that referenced this pull request Jul 14, 2025
### What changes were proposed in this pull request?
In the PR, I propose to support the `+` and `+` operators over TIME and DAY-TIME INTERVAL.

#### Syntax
```
exprA + exprB, exprB + exprA
exprA - exprB
```
where
- **exprA** - an expression of the TIME data type of any valid precision [0, 6].
- **exprB** - and expression of the DAY-TIME INTERVAL with any start and end fields `SECOND`, `MINUTE`, `HOUR`, `DAY`.

#### Returns

The result of the TIME(n) data type or raises the error `DATETIME_OVERFLOW` if the result is out of the valid range `[00:00, 24:00)`. If the result is valid, its precision `n` is the maximum precision of the input time `m` and the day-time interval `i`: `n = max(m, i)` where `i` = 6 for the end interval field `SECOND` and `0` for other fields `MINUTE`, `HOUR`, `DAY`.

### Why are the changes needed?
To conform the ANSI SQL standard:
<img width="867" alt="Screenshot 2025-07-07 at 09 41 49" src="https://github.com/user-attachments/assets/808a3bad-70a6-4c28-b23d-83e8399bd0e9" />

### Does this PR introduce _any_ user-facing change?
No. The TIME data type hasn't been released yet.

### How was this patch tested?
By running new tests and affected test suites:
```
$ build/sbt "test:testOnly *DateTimeUtilsSuite"
$ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z time.sql"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#51383 from MaxGekk/time-add-interval.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
ksbeyer pushed a commit to ksbeyer/spark that referenced this pull request Jul 14, 2025
### What changes were proposed in this pull request?
In the PR, I propose to support the `+` and `+` operators over TIME and DAY-TIME INTERVAL.

#### Syntax
```
exprA + exprB, exprB + exprA
exprA - exprB
```
where
- **exprA** - an expression of the TIME data type of any valid precision [0, 6].
- **exprB** - and expression of the DAY-TIME INTERVAL with any start and end fields `SECOND`, `MINUTE`, `HOUR`, `DAY`.

#### Returns

The result of the TIME(n) data type or raises the error `DATETIME_OVERFLOW` if the result is out of the valid range `[00:00, 24:00)`. If the result is valid, its precision `n` is the maximum precision of the input time `m` and the day-time interval `i`: `n = max(m, i)` where `i` = 6 for the end interval field `SECOND` and `0` for other fields `MINUTE`, `HOUR`, `DAY`.

### Why are the changes needed?
To conform the ANSI SQL standard:
<img width="867" alt="Screenshot 2025-07-07 at 09 41 49" src="https://github.com/user-attachments/assets/808a3bad-70a6-4c28-b23d-83e8399bd0e9" />

### Does this PR introduce _any_ user-facing change?
No. The TIME data type hasn't been released yet.

### How was this patch tested?
By running new tests and affected test suites:
```
$ build/sbt "test:testOnly *DateTimeUtilsSuite"
$ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z time.sql"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#51383 from MaxGekk/time-add-interval.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
haoyangeng-db pushed a commit to haoyangeng-db/apache-spark that referenced this pull request Jul 22, 2025
### What changes were proposed in this pull request?
In the PR, I propose to support the `+` and `+` operators over TIME and DAY-TIME INTERVAL.

#### Syntax
```
exprA + exprB, exprB + exprA
exprA - exprB
```
where
- **exprA** - an expression of the TIME data type of any valid precision [0, 6].
- **exprB** - and expression of the DAY-TIME INTERVAL with any start and end fields `SECOND`, `MINUTE`, `HOUR`, `DAY`.

#### Returns

The result of the TIME(n) data type or raises the error `DATETIME_OVERFLOW` if the result is out of the valid range `[00:00, 24:00)`. If the result is valid, its precision `n` is the maximum precision of the input time `m` and the day-time interval `i`: `n = max(m, i)` where `i` = 6 for the end interval field `SECOND` and `0` for other fields `MINUTE`, `HOUR`, `DAY`.

### Why are the changes needed?
To conform the ANSI SQL standard:
<img width="867" alt="Screenshot 2025-07-07 at 09 41 49" src="https://github.com/user-attachments/assets/808a3bad-70a6-4c28-b23d-83e8399bd0e9" />

### Does this PR introduce _any_ user-facing change?
No. The TIME data type hasn't been released yet.

### How was this patch tested?
By running new tests and affected test suites:
```
$ build/sbt "test:testOnly *DateTimeUtilsSuite"
$ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z time.sql"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#51383 from MaxGekk/time-add-interval.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants