-
Notifications
You must be signed in to change notification settings - Fork 965
[SPARK] Avoid trigger execution when getting result schema #6688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6688 +/- ##
======================================
Coverage 0.00% 0.00%
======================================
Files 684 684
Lines 42237 42277 +40
Branches 5755 5765 +10
======================================
- Misses 42237 42277 +40 ☔ View full report in Codecov by Sentry. |
cc @iodone |
if (result == null) { | ||
new StructType().add("plan", "string") | ||
} else if (result.isEmpty) { | ||
} else if (result.schema.isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I get it correctly, when we go into thie else if branch, that means the query is in planExcludes
? We return the result rather than the plan. Can we add a new flag to distinguish them ? I'm not sure if the result schema of planExcludes
is always empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to add a flag to distinguish them, because we handle all cases:
if (result == null) {
...
else if (result.schema.isEmpty) {
...
} else {
result.schema
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it is consistent with ExecuteStatement
Lines 53 to 58 in 635c793
override protected def resultSchema: StructType = { | |
if (result == null || result.schema.isEmpty) { | |
new StructType().add("Result", "string") | |
} else { | |
result.schema | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I see. How about reduce the else if and else by using super.resultSchema()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks, I will change that
.add("traceback", "array<string>") | ||
} else { | ||
result.schema | ||
super.resultSchema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here is no different from before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks no different, result
is never assigned in ExecutePython
.
…hema # 🔍 Description ## Issue References 🔗 `DataFrame.isEmpty` may trigger execution again, we should avoid it. ## Describe Your Solution 🔧 ## Types of changes 🔖 - [X] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests --- # Checklist 📝 - [X] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) **Be nice. Be informative.** Closes #6688 from wForget/planonly_schema. Closes #6688 265f0ec [wforget] fix style d71cc4a [wforget] refactor resultSchema for spark operation 0c36b3d [wforget] Avoid trigger execution when getting result schema Authored-by: wforget <643348094@qq.com> Signed-off-by: Bowen Liang <liangbowen@gf.com.cn> (cherry picked from commit da2401c) Signed-off-by: Bowen Liang <liangbowen@gf.com.cn>
Thanks, merged to master (1.10.0) and branch-1.9 (1.9.3). |
…ult schema `DataFrame.isEmpty` may trigger execution again, we should avoid it. - [X] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) --- - [X] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) **Be nice. Be informative.** Closes apache#6688 from wForget/planonly_schema. Closes apache#6688 265f0ec [wforget] fix style d71cc4a [wforget] refactor resultSchema for spark operation 0c36b3d [wforget] Avoid trigger execution when getting result schema Authored-by: wforget <643348094@qq.com> Signed-off-by: Bowen Liang <liangbowen@gf.com.cn>
🔍 Description
Issue References 🔗
DataFrame.isEmpty
may trigger execution again, we should avoid it.Describe Your Solution 🔧
Types of changes 🔖
Test Plan 🧪
Behavior Without This Pull Request ⚰️
Behavior With This Pull Request 🎉
Related Unit Tests
Checklist 📝
Be nice. Be informative.