Major differences in cluster assignment between 1.5.1 and 1.16.0

Hi BayesSpace team,

While reproducing results from a 2024 DLPFC paper, I’m seeing large differences in clustering assignments between **BayesSpace v1.5.1** and the **Bioconductor release v1.16.0**.

Using the same public dataset via [spatialLIBD](https://bioconductor.org/packages/release/data/experiment/html/spatialLIBD.html)'s `spatialLIBD::fetch_data()` the Adjusted Rand Index (ARI) between the paper’s assignments and my run with v1.16.0 ranges  **~0.17–0.43**  across samples (poor alignment). In contrast, using v1.5.1 (the version used in the paper), ARIs against the published assignments are a **perfect 1.0 for all samples**. 

In my (relatively inexperienced and qualitative) opinion, the clustering results from 1.5.1 were cleaner and more biologically grounded. 

I’m opening this as a discussion to ask whether these differences are expected or indicative of a regression/behavior change that isn’t obvious from the changelog ([NEWS.md](https://github.com/edward130603/BayesSpace/blob/devel/NEWS.md))

### Reproducible script
This script:
- Pulls the data with `spatialLIBD::fetch_data()`
- Sets a seed (same as paper)
- Runs clustering
- Computes and prints ARI for each sample vs. the paper’s `BayesSpace_harmony_09` column
- Prints reproducibility information (`OS`, `BiocManager` version, `R` version, and package versions via `session_info()`)

```r
library(BayesSpace)
library(mclust)
library(spatialLIBD)
library(sessioninfo)
library(glue)

# load("/data/zusers/kresgeb/psych_encode/spatialLIBD_fetch_data/2024.RData", verbose = TRUE)

spe <- spatialLIBD::fetch_data(type = "spatialDLPFC_Visium")

bayesspace_version <- packageVersion("BayesSpace")

output_csv <- glue("/zata/zippy/kresgeb/clustering_comparison/results/{bayesspace_version}_output_cluster_assignments.csv")

seed <- 030122

set.seed(seed)

message(paste("BayesSpace Version:", packageVersion("BayesSpace")))

message((paste("Seed:", seed)))

message(paste("BayesSpace clustering started at:", Sys.time()))

# Run BayesSpace
message("Running BayesSpace spatialCluster...")
spe <- spatialCluster(
    spe,
    use.dimred = "HARMONY", # Use "PCA" if HARMONY is not available
    q = 9,
    nrep = 10000,
)

# Write cluster assignments to CSV
message("Saving cluster assignments...")
df <- data.frame(
    barcode = colnames(spe),
    sample_id = colData(spe)$sample_id,
    cluster = colData(spe)$spatial.cluster
)
write.csv(df, file = output_csv, row.names = FALSE)

message(paste("BayesSpace clustering finished at:", Sys.time()))

# Get sample IDs
samples <- unique(colData(spe)$sample_id)

# Compute ARI for each sample
ari_per_sample <- sapply(samples, function(sid) {
    idx <- colData(spe)$sample_id == sid
    cl1 <- colData(spe)$spatial.cluster[idx]
    cl2 <- colData(spe)$BayesSpace_harmony_09[idx]

    # Check for at least 2 unique labels in each clustering
    if (length(unique(cl1)) > 1 && length(unique(cl2)) > 1) {
        adjustedRandIndex(cl1, cl2)
    } else {
        NA # Not defined if only one cluster
    }
})

# Convert to data.frame for easier use
ari_df <- data.frame(
    sample_id = names(ari_per_sample),
    ARI = unname(ari_per_sample)
)

print(ari_df)

## Reproducibility information
print("Reproducibility information:")
Sys.time()
proc.time()
options(width = 120)
session_info()
```

### Runs performed
I ran **exactly the same script** twice, changing only the BayesSpace version:
- **v1.5.1** (as used in the paper) installed via `devtools::install_github("edward130603/BayesSpace", ref = "v1.5.1")`
  - [1.5.1_std_out.txt](https://github.com/user-attachments/files/21799176/1.5.1_std_out.txt)
  - [1.5.1_output_cluster_assignments.csv](https://github.com/user-attachments/files/21799190/1.5.1_output_cluster_assignments.csv)
- **v1.16.0** (installed via Bioconductor at time of testing using `BiocManager::install("BayesSpace")`) 
  - [1.16.0_std_out.txt](https://github.com/user-attachments/files/21799198/1.16.0_std_out.txt)
  - [1.16.0_output_cluster_assignments.csv](https://github.com/user-attachments/files/21799208/1.16.0_output_cluster_assignments.csv)


Happy to run further checks or variations if that would help narrow this down.

Best,
Ben


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Major differences in cluster assignment between 1.5.1 and 1.16.0 #150

Reproducible script

Runs performed

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Major differences in cluster assignment between 1.5.1 and 1.16.0 #150

Description

Reproducible script

Runs performed

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions