Skip to content

Add symbols imported in packages to workspace #872

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: feature/static-imports
Choose a base branch
from

Conversation

lionel-
Copy link
Contributor

@lionel- lionel- commented Jul 15, 2025

Addresses posit-dev/positron#2252.
Addresses posit-dev/positron#8549.
Addresses posit-dev/positron#8550.
Progress towards posit-dev/positron#2321.

Branched from #870. It was rather easy to implement based on the infrastructure provided in that PR.

This fixes diagnostics for imported symbols but I was still seeing some weirdness with local definitions because we didn't synchronise the indexer and the diagnostics properly:

// FIXME: The initial indexer is currently racing against our state notification
// handlers. The indexer is synchronised through a mutex but we might end up in
// a weird state. Eventually the index should be moved to WorldState and created
// on demand with Salsa instrumenting and cancellation.

This is now fixed. I've also made a change to take into account objects assigned globally. We were detecting global functions but not other kinds of objects.

I've hacked in testthat imports inside testthat/ files. Should be good enough for now. Will fail when people edit their testthat.R file with additional library loading.

QA Notes

You should now be able to open a package like ellmer and not see diagnostics. This won't be 100% proof for all packages, but I've checked with rlang and ellmer.

See also posit-dev/positron#8549 and posit-dev/positron#8550 for reprexes for adjacent fixes.

@lionel- lionel- marked this pull request as draft July 15, 2025 14:41
@lionel- lionel- marked this pull request as ready for review July 16, 2025 14:27
}
}

async fn process_diagnostics_batch(batch: Vec<RefreshDiagnosticsTask>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reconsider batching? Based on zoom call

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it back, the interleaved indexer/diagnostics processing will arise only if the queue tasks are processed faster than they arrive so the current setup is fine. If we get a bunch of tasks very rapidly, we will collect them, split them by type, and process the indexer tasks first. So the batching is still useful.

This setup will be entirely replaced by Salsa dependencies. Diagnostics tasks will be cancelled automatically as document updates arrive. So we shouldn't worry about this temporary setup too much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adapted the loop so that we check back for more indexer tasks once we have finished a round of indexing. We do that at most 10 times so diagnostics get refreshed once in a while if the user is writing too fast to keep up. I think I'm happy with the queue setup now!

@lionel-
Copy link
Contributor Author

lionel- commented Jul 17, 2025

@DavisVaughan I'm out of the day but this should be ready for review. When I come back I'll add some diagnostics tests for packages.

Copy link
Contributor

@DavisVaughan DavisVaughan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to work fairly well with dplyr and vctrs

Comment on lines +830 to +838
let mut doc = document.clone();
if Path::new(uri.path())
.components()
.any(|c| c.as_os_str() == "testthat")
{
doc.testthat = true;
};

let diagnostics = generate_diagnostics(doc, state.clone());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels pretty gross to leak testthat hacks out into Document

Is there any way we can avoid this? Maybe have

pub(crate) fn generate_diagnostics_opt(doc: Document, state: WorldState, testthat: bool) -> Vec<Diagnostic> {}

pub(crate) fn generate_diagnostics(doc: Document, state: WorldState) -> Vec<Diagnostic> {
  generate_diagnostics_opt(doc, state, false)
}

And you'd use generate_diagnostics_opt() here just to determine testthat but we wouldn't have to have it in Document itself, and we don't need document.clone(), which doesn't seem nice from a performance perspective (cloning the Document very often like this seems bad)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds particularly nice to me because it looks like testthat: bool would not have to extend past generate_diagnostics(), you take care of it right there, which really limits the leaking of this hack

@@ -7,15 +7,20 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat reasonable place this fails - vec_order_radix() usage in dplyr, brought in from vctrs via

https://github.com/tidyverse/dplyr/blob/be3e3a05fd0081cb53168d6aedb417d62139b75d/R/zzz.R#L17-L20

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar with clock

clock_init_weekday_utils <- function(env) {
  assign("clock_empty_weekday", weekday(integer()), envir = env)

  invisible(NULL)
}

called from onLoad

let package_path = lib_path.join(name);

/// Load a package from a given path.
pub fn load(package_path: &std::path::Path) -> anyhow::Result<Option<Self>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn load(package_path: &std::path::Path) -> anyhow::Result<Option<Self>> {
pub fn load_from_folder(package_path: &std::path::Path) -> anyhow::Result<Option<Self>> {

?

For a second I thought there was a single file that package_path points to, but it looks like it is intended to point to a folder. So this plus a name of just path: &Path seems good imo

Comment on lines -44 to -48
if description.name != name {
return Err(anyhow::anyhow!(
"`Package` field in `DESCRIPTION` doesn't match folder name '{name}'"
));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What violates this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see you moved this. Why would non-library-path packages be any different?

Comment on lines +93 to +96
// Try to load package from this workspace folder and set as
// root if found. This means we're dealing with a package
// source.
if state.root.is_none() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess implicitly this means we only work with the first workspace, in the case of multi-root workspaces

Comment on lines +851 to +853
while let Some(Ok(Some(result))) = futures.next().await {
publish_diagnostics(result.uri, result.diagnostics, result.version);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it weird that we are already on the indexer thread, and we are calling publish_diagnostics(), which then sends to the auxiliary thread? I guess not?

Comment on lines +856 to +884
pub(crate) fn index_start(folders: Vec<String>, state: WorldState) {
INDEXER_QUEUE
.send(IndexerQueueTask::Indexer(IndexerTask::Start { folders }))
.unwrap_or_else(|err| lsp::log_error!("Failed to queue initial indexing: {err}"));

diagnostics_refresh_all(state);
}

pub(crate) fn index_update(uri: Url, document: Document, state: WorldState) {
INDEXER_QUEUE
.send(IndexerQueueTask::Indexer(IndexerTask::Update {
document,
uri: uri.clone(),
}))
.unwrap_or_else(|err| lsp::log_error!("Failed to queue index update: {err}"));

// Refresh all diagnostics since the indexer results for one file may affect
// other files
diagnostics_refresh_all(state);
}

pub(crate) fn diagnostics_refresh_all(state: WorldState) {
for (uri, _document) in state.documents.iter() {
INDEXER_QUEUE
.send(IndexerQueueTask::Diagnostics(RefreshDiagnosticsTask {
uri: uri.clone(),
state: state.clone(),
}))
.unwrap_or_else(|err| lsp::log_error!("Failed to queue diagnostics refresh: {err}"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some of this was there before, but there is more clone()ing of the WorldState than I expected in this code

It feels like we are cloning like crazy:

  • Call site of index_start()
  • Call site of index_update(), every change, really??
  • Each document in diagnostics_refresh_all() gets its own copy
  • Call site of generate_diagnostics(), so basically every change, I think this was here before

I know the intention of WorldState is that we can send it to other threads but it just seems like it's happening a lot here, is that fully intentional?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants