-
Notifications
You must be signed in to change notification settings - Fork 108
data explorer: "Copy as Code" comms #8536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Dhruvi Sompura <dhruvisompura@users.noreply.github.com>
E2E Tests 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for exporting the current data explorer view as code by fetching available syntaxes and translating applied filters/sorts into user-copyable code.
- Introduce new RPC methods
GetCodeSyntaxes
andTranslateToCode
in the DuckDB backend, comm layer, and client instances. - Add TypeScript interfaces and Python stubs for the new code export functionality.
- Extend the OpenRPC specification to include the new methods.
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
src/vs/workbench/services/positronDataExplorer/common/positronDataExplorerDuckDBBackend.ts | Added backend RPC methods for code syntax retrieval and code translation |
src/vs/workbench/services/positronDataExplorer/browser/positronDataExplorerInstance.ts | Implemented instance methods and UI logic for code export |
src/vs/workbench/services/positronDataExplorer/browser/interfaces/positronDataExplorerInstance.ts | Extended the public instance interface with code export methods |
src/vs/workbench/services/languageRuntime/common/positronDataExplorerComm.ts | Defined new comm interfaces and request types for code export |
src/vs/workbench/services/languageRuntime/common/languageRuntimeDataExplorerClient.ts | Added client‐side wrappers for the new RPCs |
positron/comms/data_explorer-backend-openrpc.json | Extended the OpenRPC spec with translate_to_code and get_code_syntaxes |
extensions/positron-python/python_files/posit/positron/data_explorer_comm.py | Added Python Pydantic models and request types for code export |
extensions/positron-python/python_files/posit/positron/data_explorer.py | Added abstract stubs and example implementations for code export in Python |
Comments suppressed due to low confidence (4)
src/vs/workbench/services/positronDataExplorer/browser/positronDataExplorerInstance.ts:434
- [nitpick] The JSDoc summary for
getCodeSyntaxes
is incorrect. Update it to describe fetching available syntaxes rather than exporting filters to clipboard.
* Export filters as code.
src/vs/workbench/services/positronDataExplorer/browser/positronDataExplorerInstance.ts:447
- [nitpick] The JSDoc for
translateToCode
refers to clipboard copying. It should describe returning generated code for a desired syntax.
* Export filters as code.
src/vs/workbench/services/languageRuntime/common/positronDataExplorerComm.ts:1402
- The
supported_features
array should list the actual RPC method key (translate_to_code
), notexport_as_code
, to correctly reflect available backend capabilities.
*/
positron/comms/data_explorer-backend-openrpc.json:262
- The
code_syntax
parameter is defined as a string, so it should not include anitems
schema. Remove theitems
block here.
"items": {
src/vs/workbench/services/positronDataExplorer/common/positronDataExplorerDuckDBBackend.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got some early comments and questions to help me get better oriented to the surroundings. I am very new to the data explorer neighborhood.
Two issues I raised inline:
- Should we return the code as a single string or an array of strings?
- The type of the object being explored is not explicit over on this side. Which is perhaps to be expected (?), but I could use some help understanding this better.
A couple of other high-level thoughts:
- It feels like the code syntax should be more of an enum than a free string. Maybe namespaced within language? Something like
python:polars
,r:tidyverse
, etc. - The current PR treats available code syntaxes as a matter to be determined every time that a user asks for code. But this is really a static capability that the kernel has when it fires up and it never changes during the session. Then the syntaxes available to a specific data explorer instance are always a subset of that.
- When I imagine the syntaxes being displayed in the modal as a dropdown or similar: it feels like the set of choices should be fixed (e.g. for R, "base", "tidyverse", "data.table") and appear in a fixed order. And then the default selection could be influenced by the type of data in the explorer (best match). And a syntax that is truly impossible/unsupported might be greyed out (*). Instead of the backend returning a set of possibilities (such as
['base-r', 'dplyr']
), it feels more like we should pass back aTRUE/FALSE
for each syntax (can we do it?) and a default (the syntax that seems like the best match).
(*) That would never really be the case for R, because we have as.whatever()
functions to coerce things back and forth. I gather the situation for Python might be different?
], | ||
"properties": { | ||
"code": { | ||
"type": "string", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How sure are we that we want a string? As opposed to an array of strings, i.e. one per line. I can't say exactly if/why this matters, but somehow sending lines of code back to the front end feels more proper than a single collapsed string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I like this idea! It does feel more proper, somehow. An array would probably be easier for formatting anyway; I'll move over to this structure. 👍
{ | ||
"name": "get_code_syntaxes", | ||
"summary": "Get code syntaxes supported for code translation", | ||
"description": "Get all available code syntaxes supported for translation for a data view", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like there's some notion that the backend is expected to reply using knowledge of the type of data being viewed. In the R world, i.e. that the backend will know the class(es) of the object, in this sense:
class(palmerpenguins::penguins)
#> [1] "tbl_df" "tbl" "data.frame"
which would effect which syntaxes are supported or which syntax is the default.
I don't see this information being sent over. Does the backend obtain that info another way? I do understand that the object in question must already exist in the runtime and, therefore, the backend already knows (or can know) this. And I even have some sense that all of this is unfolding inside a data explorer instance and therefore the target data object is implicitly available. I just don't understand the information flow yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The table lives on the RDataExplorer
, which is created when the data explorer is opened. We should be able to use that table to get information like class without explicitly passing it over the comm.
This could be done on the language pack side, but I do want to stay away from storing any knowledge on the language pack side. Theoretically, if another, non-R/Python language pack wanted to implement this, it would be ideal to not have to make any changes to Positron core.
It's true that the full scope of options would be available when the kernel fires up. At the moment when a data explorer instance is created, we could get these options before the user asks for code. That would at least allow us to make one less async call in the modal, which might make it feel a bit snappier.
With a dropdown, I think it makes the most sense to just omit the options that are not available, rather than greying them out and adding prose that these are the options available for this data type. I'm not sure I understand the benefit of a true/false vs listing what is possible in order. It would add another comm call with essentially the same information, since this would be created per data explorer instance? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a big step forward. I'm definitely learning a lot about this neck of the woods just by reviewing this!
In my inline comments, there are 2 big themes:
- Vocabulary questions. On one hand, this is an internal matter and is not a hill I will die on. And yet I do really think "translate" and "guess" don't quite have the right vibes.
- Type choices. It feels like we still need to adjust some of the interfaces. Ideally we'd use an enum for the syntaxes, but I do understand why we can't, if we want that to be completely owned by the backends. The way we're handling "no syntax" doesn't feel quite right yet, i.e. that shouldn't just be another arbitrary syntax string. Hopefully my suggestions will get us moving in a good direction.
"name": "translate_to_code", | ||
"summary": "Translates the current data view into a code snippet.", | ||
"description": "Translate filters and sort keys as code in different syntaxes like pandas, polars, data.table, dplyr", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "translate" is not a great verb here, because it could have so many meanings. Translate from English to another language? Translate from R to Python? I realize there's a lot of context to help someone clarify this, but I suspect it would be even better to use a different verb.
For example, I think "convert" or "generate" are better.
This choice obviously affects a lot of locations (every instance of "translate", "translation", "translated", etc.), so I won't repeat the comment elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had started with generate and moved away from it since I felt it was too AI-y, but I'm on board with convert
instead 💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hear you re: "generate" and "AI-y" in terms of the UI. But internally, I think "generate" is a very viable option. I'm thinking about how nice "generated code" feels versus "converted code".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, both "generate" and "convert" might be handy for actual names and then also for docstrings.
} | ||
}, | ||
{ | ||
"name": "guess_code_syntax", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "suggest" fits the intent better. I feel like "guess" implies there is a right answer and we're trying our best to guess it. Whereas "suggest" seems to fit the situation better. When I see "guessing", I think about "type guessing". Whereas we're suggesting a syntax based on the type.
I feel less strongly about this than about "translate".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! I like this idea. Will change from guess
-> suggest
this.guessCodeSyntax().then(syntax => { | ||
this.guessedSyntax = syntax; | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly paranoid, but should this check that the suggested (guessed, in the current vocabulary) syntax is among the ones that this data explorer instance claims to support? Given that syntax can't be an actual enum on this side, it feels like we should be a bit pedantic re: checking that the backend is internally consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added early erroring/returns if 1) the backend has stated it does not support any syntaxes to convert or 2) if the desired syntax is not one of the options available. This definitely feels a lot safer!
* Guess the code syntax for export. | ||
* @returns A promise that resolves to the available code syntaxes. | ||
*/ | ||
async guessCodeSyntax(): Promise<CodeSyntax> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this is a better place to check that the returned syntax is among those that the backend claims to support.
/** | ||
* Best guess of syntax to use for code translation | ||
*/ | ||
code_syntax?: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this being nullable creates problems elsewhere and is, I'm guessing, why we don't then use CodeSyntax
in all of the other places I'd expect later in this file. Something feels slightly off about the current situation, where it feels like 'No available syntaxes'
could be an actual syntax as far as the frontend knows. Whereas you'd really like to make undefined
be the signal for that situation.
I chatted with copilot about my discomfort and this is an interesting suggestion. At the very least, it starts to get at my concern:
/**
* Represents a code syntax name supported by the backend
* This is a string at runtime, but using a distinct type for type safety
*/
export type CodeSyntaxName = string;
/**
* Best guess of syntax to use for code translation
*/
export interface CodeSyntax {
/**
* Best guess of syntax to use for code translation
*/
code_syntax?: CodeSyntaxName;
}
// Update the SupportedFeatures interface
export interface SupportedFeatures {
// ...other properties
/**
* List of supported code syntax names
*/
code_syntaxes: Array<CodeSyntaxName>;
}
// Update TranslateToCodeParams
export interface TranslateToCodeParams {
/**
* Zero or more column filters to apply
*/
column_filters: Array<ColumnFilter>;
/**
* Zero or more row filters to apply
*/
row_filters: Array<RowFilter>;
/**
* Zero or more sort keys to apply
*/
sort_keys: Array<ColumnSortKey>;
/**
* The code syntax to use for translation
*/
code_syntax: CodeSyntaxName;
}
// Update the translateToCode method
translateToCode(columnFilters: Array<ColumnFilter>, rowFilters: Array<RowFilter>, sortKeys: Array<ColumnSortKey>, codeSyntax: CodeSyntaxName): Promise<TranslatedCode> {
return super.performRpc('translate_to_code', ['column_filters', 'row_filters', 'sort_keys', 'code_syntax'], [columnFilters, rowFilters, sortKeys, codeSyntax]);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something feels slightly off about the current situation, where it feels like 'No available syntaxes' could be an actual syntax as far as the frontend knows. Whereas you'd really like to make undefined be the signal for that situation.
Ah, yeah that is not ideal 😬 I think the only time we really want a string to designate that there are no syntaxes is in the actual modal itself, not anywhere in this PR. Updated to reflect this!
* Translates the data explorer view to code in the desired syntax. | ||
* @returns A Promise<string> that resolves when generated code is returned. | ||
*/ | ||
async translateToCode(desiredSyntax: string): Promise<string> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Placing this here for lack of another concrete location: it feels like somewhere there should be an early exit for this entire feature if the backend state has an empty array of supported syntaxes. AFAICT that would be the expected way for the frontend to learn that the backend does not support "export to code".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a isSyntaxSupported()
function, as well as a feature support flag (much like the other supported_features
on the comm) to give some flexibility as we build out different syntax support! We can now early exit when we're converting code, or perhaps even earlier when trying to get a suggestion.
/** | ||
* The guessed code syntax for the data explorer instance, if we generate code. | ||
*/ | ||
public guessedSyntax: CodeSyntax = { code_syntax: 'No available syntaxes' }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I say more about this below, but having a special string indicate that this piece of information is unavailable doesn't feel right.
Co-authored-by: Jennifer (Jenny) Bryan <jenny.f.bryan@gmail.com> Signed-off-by: Isabel Zimmerman <54685329+isabelizimm@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@isabelizimm and I just did another round of live review on zoom. I'm going to approve this now, with the expectation that some more tweaks are coming based on that conversation. It would be great to update the main PR description to reflect the current PR state.
This feels really close to an MVP we can merge and then start to refine. It's a good time to get a 3rd set of 👀 on this.
Signed-off-by: Isabel Zimmerman <54685329+isabelizimm@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jumping in a little late, and without a ton of "comms" definition experiences, it looks like a great start to me. I have some thoughts on how the UI of this could work, when that time comes.
related issue: #8531
related pr for Ark: posit-dev/ark#873
I'm not exactly sure of the flow of these PRs with Ark, but I believe we'll want to merge that PR, release Ark, and then pick up the new release in this PR before this PR is merged.
This PR implements makes 3 major changes to the comms:
convert_to_code
as a supported feature: there are other supported features, such as row filters and column filters. This structure includes if it is supported as well as supported syntaxes, as an array of values.suggest_code_syntax
message: for a particular data explorer instance, decide what options a user has for generating code. I would expect the return value of this to bebase-r
or something to that effect.convert_to_code
message: for a particular data explorer instance, this message sends over the filters and sorts applied to the data and the desired syntax. The expectation is that this will return a code snippet string. The front end will (eventually) apply code highlighting and formatting.Below is a diagram of the general flow through comms. Note that these comm messages will not call each other; they will be triggered by a modal in Positron core.
There is an open PR that builds off this branch to use these comms, so you can actually fire them off/interact with them #8537
Release Notes
New Features
Bug Fixes
QA Notes
It doesn't look like we have much testing for comms, but open to ideas! Currently I plan on adding unit tests on each language runtime and then e2e tests for the UI pieces.