generated from openproblems-bio/task_template
-
Notifications
You must be signed in to change notification settings - Fork 9
Precompute clustering #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
026e765
add clustering data frame to the solution
rcannood 9392ab6
update script
rcannood 013b54c
add comments
rcannood b075b3c
add clustering key prefix for cluster-based metrics
mumichae 0f99a8d
add resolutions parameters to metrics to make use of precomputed clus…
mumichae a4404ca
fix clustering key for nmi and ari
mumichae 54c0fd9
set correct version of scib to make using precomputed clusters possible
mumichae f80d939
add resolutions argument to cluster-based metrics
mumichae 5234d3c
use igraph for clustering on CPU
mumichae 17d436c
use partial reading for clustering
mumichae e77ad55
rename cluster keys to be consistent with scib metrics
mumichae 810507f
fix import and reading missing slot
mumichae 391e4b2
get clustering from obsm
mumichae 4b95c18
Add config to create test resources script
lazappi 6c56070
Add clustering to benchmark workflow
lazappi f3dc116
Remove clustering from process dataset workflow
lazappi b285dbe
Move output processing to subworkflow
lazappi 81f9649
Update API with processing subworkflow
lazappi 3ff4797
Re-enable all methods/metrics
lazappi 5ee87fd
Remove clustering from fil_solution.yaml API file
lazappi 19b6e52
Add processing to test resources script
lazappi 38552af
update readme
rcannood File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
namespace: data_processors | ||
info: | ||
type: process_integration | ||
type_info: | ||
label: Process integration | ||
summary: Process output from an integration method to the format expected by metrics | ||
description: | | ||
This component will: | ||
|
||
- Perform transformations of the integration output | ||
- Cluster the integrated data at different resolutions | ||
|
||
argument_groups: | ||
- name: Inputs | ||
arguments: | ||
- name: "--input_dataset" | ||
__merge__: /src/api/file_dataset.yaml | ||
type: file | ||
direction: input | ||
required: true | ||
- name: "--input_integrated" | ||
__merge__: /src/api/file_integrated.yaml | ||
type: file | ||
direction: input | ||
required: true | ||
- name: --expected_method_types | ||
type: string | ||
direction: input | ||
required: true | ||
multiple: true | ||
description: | | ||
The expected output types of the batch integration method. | ||
choices: [ feature, embedding, graph ] | ||
- name: Outputs | ||
arguments: | ||
- name: "--output" | ||
__merge__: file_integrated_processed.yaml | ||
direction: output | ||
required: true | ||
|
||
test_resources: | ||
- type: python_script | ||
path: /common/component_tests/run_and_check_output.py | ||
- path: /resources_test/task_batch_integration/cxg_immune_cell_atlas | ||
dest: resources_test/task_batch_integration/cxg_immune_cell_atlas |
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
30 changes: 30 additions & 0 deletions
30
src/data_processors/precompute_clustering_merge/config.vsh.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
name: precompute_clustering_merge | ||
namespace: data_processors | ||
label: Merge clustering precomputations | ||
summary: Merge the precompute results of clustering on the input dataset | ||
arguments: | ||
- name: --input | ||
type: file | ||
direction: input | ||
required: true | ||
- name: --output | ||
type: file | ||
direction: output | ||
required: true | ||
- name: --clusterings | ||
type: file | ||
description: Clustering results to merge | ||
direction: input | ||
required: true | ||
multiple: true | ||
resources: | ||
- type: python_script | ||
path: script.py | ||
engines: | ||
- type: docker | ||
image: openproblems/base_python:1.0.0 | ||
runners: | ||
- type: executable | ||
- type: nextflow | ||
directives: | ||
label: [midtime, midmem, lowcpu] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
import anndata as ad | ||
import pandas as pd | ||
|
||
## VIASH START | ||
par = { | ||
"input": "resources_test/task_batch_integration/cxg_immune_cell_atlas/dataset.h5ad", | ||
"clusterings": ["output.h5ad", "output2.h5ad"], | ||
"output": "output3.h5ad", | ||
} | ||
## VIASH END | ||
|
||
print("Read clusterings", flush=True) | ||
clusterings = [] | ||
for clus_file in par["clusterings"]: | ||
adata = ad.read_h5ad(clus_file) | ||
obs_filt = adata.obs.filter(regex='leiden_[0-9.]+') | ||
clusterings.append(obs_filt) | ||
|
||
print("Merge clusterings", flush=True) | ||
merged = pd.concat(clusterings, axis=1) | ||
|
||
print("Read input", flush=True) | ||
input = ad.read_h5ad(par["input"]) | ||
|
||
input.obsm["clustering"] = merged | ||
|
||
print("Store outputs", flush=True) | ||
input.write_h5ad(par["output"], compression="gzip") |
35 changes: 35 additions & 0 deletions
35
src/data_processors/precompute_clustering_run/config.vsh.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
name: precompute_clustering_run | ||
namespace: data_processors | ||
label: Run clustering precomputations | ||
summary: Run clustering on the input dataset | ||
arguments: | ||
- name: --input | ||
__merge__: /src/api/file_common_dataset.yaml | ||
direction: input | ||
required: true | ||
- name: --output | ||
__merge__: /src/api/file_dataset.yaml | ||
direction: output | ||
required: true | ||
- type: double | ||
name: resolution | ||
default: 0.8 | ||
description: Resolution parameter for clustering | ||
resources: | ||
- type: python_script | ||
path: script.py | ||
- path: /src/utils/read_anndata_partial.py | ||
engines: | ||
- type: docker | ||
image: openproblems/base_python:1.0.0 | ||
setup: | ||
- type: python | ||
pypi: | ||
- scanpy | ||
- igraph | ||
- leidenalg | ||
runners: | ||
- type: executable | ||
- type: nextflow | ||
directives: | ||
label: [midtime, midmem, lowcpu] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
import sys | ||
import anndata as ad | ||
|
||
# check if we can use GPU | ||
USE_GPU = False | ||
try: | ||
import subprocess | ||
assert subprocess.run('nvidia-smi', shell=True, stdout=subprocess.DEVNULL).returncode == 0 | ||
from rapids_singlecell.tl import leiden | ||
USE_GPU = True | ||
except Exception as e: | ||
mumichae marked this conversation as resolved.
Show resolved
Hide resolved
|
||
from scanpy.tl import leiden | ||
|
||
## VIASH START | ||
par = { | ||
"input": "resources_test/task_batch_integration/cxg_immune_cell_atlas/dataset.h5ad", | ||
"output": "output.h5ad", | ||
"resolution": 0.8, | ||
} | ||
## VIASH END | ||
|
||
sys.path.append(meta["resources_dir"]) | ||
from read_anndata_partial import read_anndata | ||
|
||
n_cell_cpu = 300_000 | ||
|
||
print("Read input", flush=True) | ||
input = read_anndata(par["input"], obs='obs', obsp='obsp', uns='uns') | ||
|
||
key = f'leiden_{par["resolution"]}' | ||
kwargs = dict() | ||
if not USE_GPU: | ||
kwargs |= dict( | ||
flavor='igraph', | ||
n_iterations=2, | ||
) | ||
|
||
mumichae marked this conversation as resolved.
Show resolved
Hide resolved
|
||
print(f"Run Leiden clustering with {kwargs}", flush=True) | ||
leiden( | ||
input, | ||
resolution=par["resolution"], | ||
key_added=key, | ||
mumichae marked this conversation as resolved.
Show resolved
Hide resolved
|
||
**kwargs, | ||
) | ||
|
||
print("Store outputs", flush=True) | ||
output = ad.AnnData( | ||
obs=input.obs[[key]], | ||
) | ||
output.write_h5ad(par["output"], compression="gzip") |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.