Add symbols imported in packages to workspace #872

lionel- · 2025-07-15T14:41:38Z

Addresses posit-dev/positron#2252.
Addresses posit-dev/positron#8549.
Addresses posit-dev/positron#8550.
Progress towards posit-dev/positron#2321.

Branched from #870. It was rather easy to implement based on the infrastructure provided in that PR.

This fixes diagnostics for imported symbols but I was still seeing some weirdness with local definitions because we didn't synchronise the indexer and the diagnostics properly:

ark/crates/ark/src/lsp/state_handlers.rs

Lines 414 to 417 in 7175d83

    
           // FIXME: The initial indexer is currently racing against our state notification 
        
           // handlers. The indexer is synchronised through a mutex but we might end up in 
        
           // a weird state. Eventually the index should be moved to WorldState and created 
        
           // on demand with Salsa instrumenting and cancellation.

This is now fixed. I've also made a change to take into account objects assigned globally. We were detecting global functions but not other kinds of objects.

I've hacked in testthat imports inside testthat/ files. Should be good enough for now. Will fail when people edit their testthat.R file with additional library loading.

QA Notes

You should now be able to open a package like ellmer and not see diagnostics. This won't be 100% proof for all packages, but I've checked with rlang and ellmer.

See also posit-dev/positron#8549 and posit-dev/positron#8550 for reprexes for adjacent fixes.

DavisVaughan · 2025-07-16T16:09:33Z

crates/ark/src/lsp/main_loop.rs

+    }
+}
+
+async fn process_diagnostics_batch(batch: Vec<RefreshDiagnosticsTask>) {


Reconsider batching? Based on zoom call

I take it back, the interleaved indexer/diagnostics processing will arise only if the queue tasks are processed faster than they arrive so the current setup is fine. If we get a bunch of tasks very rapidly, we will collect them, split them by type, and process the indexer tasks first. So the batching is still useful.

This setup will be entirely replaced by Salsa dependencies. Diagnostics tasks will be cancelled automatically as document updates arrive. So we shouldn't worry about this temporary setup too much.

I adapted the loop so that we check back for more indexer tasks once we have finished a round of indexing. We do that at most 10 times so diagnostics get refreshed once in a while if the user is writing too fast to keep up. I think I'm happy with the queue setup now!

crates/ark/src/lsp/main_loop.rs

lionel- · 2025-07-17T08:23:24Z

@DavisVaughan I'm out of the day but this should be ready for review. When I come back I'll add some diagnostics tests for packages.

DavisVaughan

Seems to work fairly well with dplyr and vctrs

DavisVaughan · 2025-07-21T19:20:20Z

crates/ark/src/lsp/main_loop.rs

+                let mut doc = document.clone();
+                if Path::new(uri.path())
+                    .components()
+                    .any(|c| c.as_os_str() == "testthat")
+                {
+                    doc.testthat = true;
+                };
+
+                let diagnostics = generate_diagnostics(doc, state.clone());


It feels pretty gross to leak testthat hacks out into Document

Is there any way we can avoid this? Maybe have

pub(crate) fn generate_diagnostics_opt(doc: Document, state: WorldState, testthat: bool) -> Vec<Diagnostic> {} pub(crate) fn generate_diagnostics(doc: Document, state: WorldState) -> Vec<Diagnostic> { generate_diagnostics_opt(doc, state, false) }

And you'd use generate_diagnostics_opt() here just to determine testthat but we wouldn't have to have it in Document itself, and we don't need document.clone(), which doesn't seem nice from a performance perspective (cloning the Document very often like this seems bad)

This sounds particularly nice to me because it looks like testthat: bool would not have to extend past generate_diagnostics(), you take care of it right there, which really limits the leaking of this hack

DavisVaughan · 2025-07-21T19:25:12Z

crates/ark/src/lsp/main_loop.rs

@@ -7,15 +7,20 @@



Somewhat reasonable place this fails - vec_order_radix() usage in dplyr, brought in from vctrs via

https://github.com/tidyverse/dplyr/blob/be3e3a05fd0081cb53168d6aedb417d62139b75d/R/zzz.R#L17-L20

Similar with clock

clock_init_weekday_utils <- function(env) { assign("clock_empty_weekday", weekday(integer()), envir = env) invisible(NULL) }

called from onLoad

DavisVaughan · 2025-07-21T19:32:58Z

crates/ark/src/lsp/inputs/package.rs

-        let package_path = lib_path.join(name);
-
+    /// Load a package from a given path.
+    pub fn load(package_path: &std::path::Path) -> anyhow::Result<Option<Self>> {


Suggested change

pub fn load(package_path: &std::path::Path) -> anyhow::Result<Option<Self>> {

pub fn load_from_folder(package_path: &std::path::Path) -> anyhow::Result<Option<Self>> {

?

For a second I thought there was a single file that package_path points to, but it looks like it is intended to point to a folder. So this plus a name of just path: &Path seems good imo

DavisVaughan · 2025-07-21T19:33:51Z

crates/ark/src/lsp/inputs/package.rs

-        if description.name != name {
-            return Err(anyhow::anyhow!(
-                "`Package` field in `DESCRIPTION` doesn't match folder name '{name}'"
-            ));
-        }


What violates this?

Oh I see you moved this. Why would non-library-path packages be any different?

DavisVaughan · 2025-07-21T19:37:57Z

crates/ark/src/lsp/state_handlers.rs

+                // Try to load package from this workspace folder and set as
+                // root if found. This means we're dealing with a package
+                // source.
+                if state.root.is_none() {


I guess implicitly this means we only work with the first workspace, in the case of multi-root workspaces

DavisVaughan · 2025-07-21T19:53:25Z

crates/ark/src/lsp/main_loop.rs

+    while let Some(Ok(Some(result))) = futures.next().await {
+        publish_diagnostics(result.uri, result.diagnostics, result.version);
+    }


Is it weird that we are already on the indexer thread, and we are calling publish_diagnostics(), which then sends to the auxiliary thread? I guess not?

DavisVaughan · 2025-07-21T20:03:15Z

crates/ark/src/lsp/main_loop.rs

+pub(crate) fn index_start(folders: Vec<String>, state: WorldState) {
+    INDEXER_QUEUE
+        .send(IndexerQueueTask::Indexer(IndexerTask::Start { folders }))
+        .unwrap_or_else(|err| lsp::log_error!("Failed to queue initial indexing: {err}"));
+
+    diagnostics_refresh_all(state);
+}
+
+pub(crate) fn index_update(uri: Url, document: Document, state: WorldState) {
+    INDEXER_QUEUE
+        .send(IndexerQueueTask::Indexer(IndexerTask::Update {
+            document,
+            uri: uri.clone(),
+        }))
+        .unwrap_or_else(|err| lsp::log_error!("Failed to queue index update: {err}"));
+
+    // Refresh all diagnostics since the indexer results for one file may affect
+    // other files
+    diagnostics_refresh_all(state);
+}
+
+pub(crate) fn diagnostics_refresh_all(state: WorldState) {
+    for (uri, _document) in state.documents.iter() {
+        INDEXER_QUEUE
+            .send(IndexerQueueTask::Diagnostics(RefreshDiagnosticsTask {
+                uri: uri.clone(),
+                state: state.clone(),
+            }))
+            .unwrap_or_else(|err| lsp::log_error!("Failed to queue diagnostics refresh: {err}"));


I think some of this was there before, but there is more clone()ing of the WorldState than I expected in this code

It feels like we are cloning like crazy:

Call site of index_start()

Call site of index_update(), every change, really??

Each document in diagnostics_refresh_all() gets its own copy

Call site of generate_diagnostics(), so basically every change, I think this was here before

I know the intention of WorldState is that we can send it to other threads but it just seems like it's happening a lot here, is that fully intentional?

Add symbols imported in packages to workspace

6c6f328

lionel- marked this pull request as draft July 15, 2025 14:41

lionel- added 3 commits July 16, 2025 14:50

Add top-level variables to diagnostics context

d5afdcc

Fix race conditions between indexer and diagnostics

49ff949

Import testthat in test files

1180194

lionel- marked this pull request as ready for review July 16, 2025 14:27

DavisVaughan reviewed Jul 16, 2025

View reviewed changes

crates/ark/src/lsp/main_loop.rs Outdated Show resolved Hide resolved

lionel- added 2 commits July 17, 2025 10:20

Improve batching of indexer tasks

3971ebf

Refresh all diagnostics when a single file is updated

3949507

DavisVaughan approved these changes Jul 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add symbols imported in packages to workspace #872

Add symbols imported in packages to workspace #872

lionel- commented Jul 15, 2025 •

edited

Loading

Uh oh!

DavisVaughan Jul 16, 2025

Uh oh!

lionel- Jul 17, 2025

Uh oh!

lionel- Jul 17, 2025

Uh oh!

Uh oh!

lionel- commented Jul 17, 2025

Uh oh!

DavisVaughan left a comment

Uh oh!

DavisVaughan Jul 21, 2025

Uh oh!

DavisVaughan Jul 21, 2025

Uh oh!

DavisVaughan Jul 21, 2025

Uh oh!

DavisVaughan Jul 21, 2025

Uh oh!

DavisVaughan Jul 21, 2025

Uh oh!

DavisVaughan Jul 21, 2025

Uh oh!

DavisVaughan Jul 21, 2025

Uh oh!

DavisVaughan Jul 21, 2025

Uh oh!

DavisVaughan Jul 21, 2025

Uh oh!

DavisVaughan Jul 21, 2025

Uh oh!

Uh oh!

	// FIXME: The initial indexer is currently racing against our state notification
	// handlers. The indexer is synchronised through a mutex but we might end up in
	// a weird state. Eventually the index should be moved to WorldState and created
	// on demand with Salsa instrumenting and cancellation.

	pub fn load(package_path: &std::path::Path) -> anyhow::Result<Option<Self>> {
	pub fn load_from_folder(package_path: &std::path::Path) -> anyhow::Result<Option<Self>> {

Add symbols imported in packages to workspace #872

Are you sure you want to change the base?

Add symbols imported in packages to workspace #872

Conversation

lionel- commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QA Notes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lionel- commented Jul 17, 2025

Uh oh!

DavisVaughan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lionel- commented Jul 15, 2025 •

edited

Loading