Skip to content

Conversation

pritishpai
Copy link
Contributor

@pritishpai pritishpai commented Oct 6, 2025

Changes

Current linting does not collect dataframe calls like df.write[.mode()].saveAsTable(). Adding a method to fix this issue.

Small fix for recent blueprint change for prompts.

@pritishpai pritishpai requested a review from a team as a code owner October 6, 2025 19:11
@pritishpai pritishpai requested a review from asnare October 6, 2025 19:11
Copy link

github-actions bot commented Oct 6, 2025

✅ 115/115 passed, 10 skipped, 1h26m34s total

Running from acceptance #8923

Copy link
Contributor

@asnare asnare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an improvement, thanks. I'm happy to merge this.

Something to think about is that currently the detection misses a lot of common cases, because it assumes there will only be a single mode/format/option/partitionBy/bucketBy call between .write and .saveAsTable() whereas in practice a few of them are normally chained.

(Random thought: I think for .saveAsTable() only mode and option can be used?)

Anyway once you find the .write node,

You can solve that with recursion, once you have found .write, by checking the next node:

  • If the node is .saveAsTable(), return True.
  • If the node is one of {"mode", "format", "option", "partitionBy", "bucketBy"}, then return what these rules say about the next node (via recursion).
  • Return False.

@pritishpai
Copy link
Contributor Author

Something to think about is that currently the detection misses a lot of common cases, because it assumes there will only be a single mode/format/option/partitionBy/bucketBy call between .write and .saveAsTable() whereas in practice a few of them are normally chained.

Currently added a few more test cases, and it seems multiple chained calls are also being addressed.

@pritishpai pritishpai added this pull request to the merge queue Oct 8, 2025
Merged via the queue into main with commit 5f6bfaa Oct 8, 2025
8 checks passed
@pritishpai pritishpai deleted the fix/dataframe_pyspark_calls branch October 8, 2025 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants