Add a dataframe call matcher for linting tables #4664

pritishpai · 2025-10-06T19:11:10Z

Changes

Current linting does not collect dataframe calls like df.write[.mode()].saveAsTable(). Adding a method to fix this issue.

Small fix for recent blueprint change for prompts.

github-actions · 2025-10-06T19:30:06Z

✅ 115/115 passed, 10 skipped, 1h26m34s total

_{Running from acceptance #8923}

asnare

This is an improvement, thanks. I'm happy to merge this.

Something to think about is that currently the detection misses a lot of common cases, because it assumes there will only be a single mode/format/option/partitionBy/bucketBy call between .write and .saveAsTable() whereas in practice a few of them are normally chained.

(Random thought: I think for .saveAsTable() only mode and option can be used?)

Anyway once you find the .write node,

You can solve that with recursion, once you have found .write, by checking the next node:

If the node is .saveAsTable(), return True.
If the node is one of {"mode", "format", "option", "partitionBy", "bucketBy"}, then return what these rules say about the next node (via recursion).
Return False.

pritishpai · 2025-10-08T18:20:41Z

Something to think about is that currently the detection misses a lot of common cases, because it assumes there will only be a single mode/format/option/partitionBy/bucketBy call between .write and .saveAsTable() whereas in practice a few of them are normally chained.

Currently added a few more test cases, and it seems multiple chained calls are also being addressed.

pritishpai added 2 commits October 6, 2025 15:07

Fix for prompts change in blueprint

72d32ca

Add matcher for dataframe call

767ae1b

pritishpai requested a review from a team as a code owner October 6, 2025 19:11

pritishpai temporarily deployed to account-admin October 6, 2025 19:11 — with GitHub Actions Inactive

pritishpai requested a review from asnare October 6, 2025 19:11

Add unit test

7ca49de

pritishpai had a problem deploying to account-admin October 6, 2025 21:12 — with GitHub Actions Error

Fmt changes

1b69aa2

pritishpai temporarily deployed to account-admin October 6, 2025 21:24 — with GitHub Actions Inactive

asnare approved these changes Oct 7, 2025

View reviewed changes

Add more test cases

acf3a32

pritishpai temporarily deployed to account-admin October 7, 2025 19:17 — with GitHub Actions Inactive

Fix test source_code

cafb4a4

pritishpai temporarily deployed to account-admin October 8, 2025 18:18 — with GitHub Actions Inactive

pritishpai added this pull request to the merge queue Oct 8, 2025

Merged via the queue into main with commit 5f6bfaa Oct 8, 2025
8 checks passed

pritishpai deleted the fix/dataframe_pyspark_calls branch October 8, 2025 18:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a dataframe call matcher for linting tables #4664

Add a dataframe call matcher for linting tables #4664

Uh oh!

pritishpai commented Oct 6, 2025 •

edited by asnare

Loading

Uh oh!

github-actions bot commented Oct 6, 2025 •

edited

Loading

Uh oh!

asnare left a comment

Uh oh!

pritishpai commented Oct 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add a dataframe call matcher for linting tables #4664

Add a dataframe call matcher for linting tables #4664

Uh oh!

Conversation

pritishpai commented Oct 6, 2025 • edited by asnare Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

github-actions bot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asnare left a comment

Choose a reason for hiding this comment

Uh oh!

pritishpai commented Oct 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pritishpai commented Oct 6, 2025 •

edited by asnare

Loading

github-actions bot commented Oct 6, 2025 •

edited

Loading