Skip to content

Conversation

wForget
Copy link
Member

@wForget wForget commented Sep 12, 2024

🔍 Description

Issue References 🔗

DataFrame.isEmpty may trigger execution again, we should avoid it.

Describe Your Solution 🔧

Types of changes 🔖

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Test Plan 🧪

Behavior Without This Pull Request ⚰️

Behavior With This Pull Request 🎉

Related Unit Tests


Checklist 📝

Be nice. Be informative.

@wForget wForget self-assigned this Sep 12, 2024
@codecov-commenter
Copy link

codecov-commenter commented Sep 12, 2024

Codecov Report

Attention: Patch coverage is 0% with 10 lines in your changes missing coverage. Please review.

Project coverage is 0.00%. Comparing base (7198c72) to head (265f0ec).
Report is 19 commits behind head on master.

Files with missing lines Patch % Lines
...kyuubi/engine/spark/operation/SparkOperation.scala 0.00% 5 Missing ⚠️
.../kyuubi/engine/spark/operation/ExecutePython.scala 0.00% 2 Missing ⚠️
...e/kyuubi/engine/spark/operation/ExecuteScala.scala 0.00% 2 Missing ⚠️
...ubi/engine/spark/operation/PlanOnlyStatement.scala 0.00% 1 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff           @@
##           master   #6688   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files         684     684           
  Lines       42237   42277   +40     
  Branches     5755    5765   +10     
======================================
- Misses      42237   42277   +40     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pan3793
Copy link
Member

pan3793 commented Sep 13, 2024

cc @iodone

if (result == null) {
new StructType().add("plan", "string")
} else if (result.isEmpty) {
} else if (result.schema.isEmpty) {
Copy link
Contributor

@ulysses-you ulysses-you Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I get it correctly, when we go into thie else if branch, that means the query is in planExcludes ? We return the result rather than the plan. Can we add a new flag to distinguish them ? I'm not sure if the result schema of planExcludes is always empty.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to add a flag to distinguish them, because we handle all cases:

if (result == null) {
    ...
else if (result.schema.isEmpty) {
    ...
} else {
    result.schema
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it is consistent with ExecuteStatement

override protected def resultSchema: StructType = {
if (result == null || result.schema.isEmpty) {
new StructType().add("Result", "string")
} else {
result.schema
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I see. How about reduce the else if and else by using super.resultSchema() ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, thanks, I will change that

.add("traceback", "array<string>")
} else {
result.schema
super.resultSchema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here is no different from before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks no different, result is never assigned in ExecutePython.

@bowenliang123 bowenliang123 added this to the v1.10.0 milestone Oct 16, 2024
@bowenliang123 bowenliang123 changed the title Avoid trigger execution when getting result schema [SPARK] Avoid trigger execution when getting result schema Oct 16, 2024
@bowenliang123 bowenliang123 modified the milestones: v1.10.0, v1.9.3 Oct 16, 2024
bowenliang123 pushed a commit that referenced this pull request Oct 16, 2024
…hema

# 🔍 Description
## Issue References 🔗

`DataFrame.isEmpty` may trigger execution again, we should avoid it.

## Describe Your Solution 🔧

## Types of changes 🔖

- [X] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️

#### Behavior With This Pull Request 🎉

#### Related Unit Tests

---

# Checklist 📝

- [X] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes #6688 from wForget/planonly_schema.

Closes #6688

265f0ec [wforget] fix style
d71cc4a [wforget] refactor resultSchema for spark operation
0c36b3d [wforget] Avoid trigger execution when getting result schema

Authored-by: wforget <643348094@qq.com>
Signed-off-by: Bowen Liang <liangbowen@gf.com.cn>
(cherry picked from commit da2401c)
Signed-off-by: Bowen Liang <liangbowen@gf.com.cn>
@bowenliang123
Copy link
Contributor

Thanks, merged to master (1.10.0) and branch-1.9 (1.9.3).

turboFei pushed a commit to turboFei/kyuubi that referenced this pull request Aug 27, 2025
…ult schema

`DataFrame.isEmpty` may trigger execution again, we should avoid it.

- [X] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

---

- [X] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes apache#6688 from wForget/planonly_schema.

Closes apache#6688

265f0ec [wforget] fix style
d71cc4a [wforget] refactor resultSchema for spark operation
0c36b3d [wforget] Avoid trigger execution when getting result schema

Authored-by: wforget <643348094@qq.com>
Signed-off-by: Bowen Liang <liangbowen@gf.com.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants