Skip to content

Commit be77b8c

Browse files
committed
update tutorial notebook and readme
1 parent 4fabaa6 commit be77b8c

File tree

2 files changed

+594
-216
lines changed

2 files changed

+594
-216
lines changed

README.md

Lines changed: 49 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -25,23 +25,23 @@ $ conda config --add channels bioconda
2525
$ conda config --add channels conda-forge
2626
```
2727

28-
3) Recommended: Create an environment named `myenv` and activate it with the following commands:
28+
3) Create an environment named `myenv` , install **stream**, **jupyter**, and activate it with the following commands:
2929

3030
```sh
31-
$ conda create -n myenv python=3.6
31+
$ conda create -n myenv python=3.6 stream jupyter
3232
$ conda activate myenv
3333
```
3434

35-
4) Install the bioconda STREAM package within the environment `myenv` with the following command:
35+
**Note: For single cell atac-seq analysis, please run the following commands:**
3636

3737
```sh
38-
$ conda install stream
38+
$ conda create -n myenv python=3.6 stream stream_atac jupyter
39+
$ conda activate myenv
3940
```
4041

41-
5) To perform STREAM analyis in Jupyter Notebook as shown in **Tutorial**, run the following commands within `myenv`:
42+
4) To perform STREAM analyis in Jupyter Notebook as shown in **Tutorial**, type `jupyter notebook` within `myenv`:
4243

4344
```sh
44-
$ conda install jupyter
4545
$ jupyter notebook
4646
```
4747

@@ -129,17 +129,7 @@ perform log2 transformation
129129
--norm
130130
normalize data based on library size
131131
--atac
132-
indicate scATAC-seq data
133-
--atac_counts
134-
scATAC-seq counts file name in .tsv or .tsv.gz format. Counts file is a compressed sparse matrix that contains three columns including region indices, sample indices and the number of reads(default: None)
135-
--atac_regions
136-
scATAC-seq regions file name in .tsv or .tsv.gz format. Regions file contains three columns including chromosome names, start and end positions of regions (default: None)
137-
--atac_samples
138-
scATAC-seq samples file name in .tsv or tsv.gz. Samples file contains one column of cell names (default: None)
139-
--atac_k
140-
specify k-mers length for scATAC-seq analysis (default: 7)
141-
--atac_zscore
142-
Indicate precomputed atac zscore matrix file
132+
indicate scATAC-seq data
143133
--n_processes
144134
Specify the number of processes to use. (default, all the available cores).
145135
--loess_frac
@@ -338,30 +328,55 @@ Please note that for large dataset analysis it'll be necessary to increase the d
338328

339329
Here we we take a single cell RNA-seq dataset as an example,including data_Nestorowa.tsv.gz, cell_label.tsv.gz and cell_label_color.tsv.gz (Nestorowa, S. et al.,2016), and assuming that **they are in the current folder**, to perform trajectory inference analysis, users can simply run a single command:
340330

331+
*Using Bioconda:*
332+
```sh
333+
$ stream -m data_Nestorowa.tsv.gz -l cell_label.tsv.gz -c cell_label_color.tsv.gz
334+
```
335+
*Using Docker:*
341336
```sh
342337
$ docker run -v ${PWD}:/data -w /data pinellolab/stream -m data_Nestorowa.tsv.gz -l cell_label.tsv.gz -c cell_label_color.tsv.gz
343338
```
344339

345340
If cell labels are not available or no customized cell label color file is available, **-l** or **-c** can also be omitted
346341

342+
*Using Bioconda:*
343+
```sh
344+
$ stream -m data_Nestorowa.tsv.gz
345+
```
346+
*Using Docker:*
347347
```sh
348348
$ docker run -v ${PWD}:/data -w /data pinellolab/stream -m data_Nestorowa.tsv.gz
349349
```
350350

351351
To visualize genes of interest, user can provide a gene list file by adding **-g**, for example: gene_list.tsv.gz. Meanwhile, by adding the flag **-p**, STREAM will use the precomputed file obtained from the first running (In this way, STREAM will import precomupted pkl file so the analysis will skip structure learning part and only execute the step of visualizing genes):
352352

353+
*Using Bioconda:*
354+
```sh
355+
$ stream -m data_Nestorowa.tsv.gz -l cell_label.tsv.gz -c cell_label_color.tsv.gz -g gene_list.tsv.gz -p
356+
```
357+
*Using Docker:*
353358
```sh
354359
$ docker run -v ${PWD}:/data -w /data pinellolab/stream -m data_Nestorowa.tsv.gz -l cell_label.tsv.gz -c cell_label_color.tsv.gz -g gene_list.tsv.gz -p
355360
```
356361

357362
Users can also provide a set of gene names separated by comma or specify the root by adding **-r**:
358363

364+
*Using Bioconda:*
365+
```sh
366+
$ stream -m data_Nestorowa.tsv.gz -l cell_label.tsv.gz -c cell_label_color.tsv.gz -g Gata1,Mpo -r S1 -p
367+
```
368+
*Using Docker:*
359369
```sh
360370
$ docker run -v ${PWD}:/data -w /data pinellolab/stream -m data_Nestorowa.tsv.gz -l cell_label.tsv.gz -c cell_label_color.tsv.gz -g Gata1,Mpo -r S1 -p
361371
```
362372

363373
To explore potential marker genes, it is possible to add the flags **--DE**, **--TG**, or **--LG** to detect DE (differentially expressed) genes, transition gens, and leaf genes respectively:
364374

375+
*Using Bioconda:*
376+
```sh
377+
$ stream -m data_Nestorowa.tsv.gz -l cell_label.tsv.gz -c cell_label_color.tsv.gz --DE --TG --LG -p
378+
```
379+
*Using Docker:*
365380
```sh
366381
$ docker run -v ${PWD}:/data -w /data pinellolab/stream -m data_Nestorowa.tsv.gz -l cell_label.tsv.gz -c cell_label_color.tsv.gz --DE --TG --LG -p
367382
```
@@ -372,12 +387,22 @@ To explore the feature **mapping**, users need to provide two dataset, one is us
372387

373388
Users first need to run the following command to get initial inferred trajetories from wild-type cells:
374389

390+
*Using Bioconda:*
391+
```sh
392+
$ stream -m data_Olsson.tsv.gz -l cell_label.tsv.gz -c cell_label_color.tsv.gz --lle_components 4 --EPG_shift
393+
```
394+
*Using Docker:*
375395
```sh
376396
$ docker run -v ${PWD}:/data -w /data pinellolab/stream -m data_Olsson.tsv.gz -l cell_label.tsv.gz -c cell_label_color.tsv.gz --lle_components 4 --EPG_shift
377397
```
378398

379399
To map the genetically perturbed cells to the inferred trajectories, users can execute the following command:
380400

401+
*Using Bioconda:*
402+
```sh
403+
$ stream --new data_perturbation.tsv.gz --new_l cell_perturbation_label.tsv.gz --new_c cell_perturbation_label_color.tsv.gz
404+
```
405+
*Using Docker:*
381406
```sh
382407
$ docker run -v ${PWD}:/data -w /data pinellolab/stream --new data_perturbation.tsv.gz --new_l cell_perturbation_label.tsv.gz --new_c cell_perturbation_label_color.tsv.gz
383408
```
@@ -388,24 +413,20 @@ After running this command, a folder named **'mapping_result'** will be created
388413

389414
To perform scATAC-seq trajectory inference analysis, three files are necessary, a .tsv file of counts in compressed sparse format, a sample file in .tsv format and a region file in .bed format. (Buenrostro, J.D. et al., 2018). We assume that **they are in the current folder**.
390415

391-
Using these three files, users can run STREAM with the following command (note the flag **--atac** ):
416+
Using these three files, users can run `stream_atac` with the following command to preprocess sc-atac-seq data and get a z_score matrix file named **'zscore.tsv.gz'** (This step may take a couple of hours with a modest machine):
392417

418+
*Using Bioconda:*
393419
```sh
394-
$ docker run -v ${PWD}:/data -w /data pinellolab/stream --atac --atac_counts count_file.tsv.gz --atac_samples sample_file.tsv.gz --atac_regions region_file.bed.gz -l cell_label.tsv.gz -c cell_label_color.tsv.gz --lle_components 4
420+
$ stream_atac -c count_file.tsv.gz -s sample_file.tsv.gz -r region_file.bed.gz
395421
```
396422

397-
**The above command may take a couple of hours with a modest machine because the conversion from counts to k-mer z-score is time-consuming.** Therefore STREAM also provides the option to take as input a precomputed z-score file.
398-
399-
First, the z-score file can be obtained with the following command (add **--atac_zscore**):
423+
Then, take z-score file as input to infer trajectories using `stream`:
400424

425+
*Using Bioconda:*
401426
```sh
402-
$ docker run -v ${PWD}:/data -w /data pinellolab/stream --atac --atac_counts count_file.tsv.gz --atac_samples sample_file.tsv.gz --atac_regions region_file.bed.gz -l cell_label.tsv.gz -c cell_label_color.tsv.gz --atac_zscore
427+
$ stream --atac -m zscore.tsv.gz -l cell_label.tsv.gz -c cell_label_color.tsv.gz --lle_components 4
403428
```
404-
405-
The above command will generate a file named **'zscore.tsv'**. It’s a tab-delimited z-score matrix with k-mers in row and cells in column. Each entry is a scaled z-score of the accessibility of each k-mer across cells.
406-
407-
Second, take z-score file as input to infer trajectories:
408-
429+
*Using Docker:*
409430
```sh
410431
$ docker run -v ${PWD}:/data -w /data pinellolab/stream --atac -m zscore.tsv.gz -l cell_label.tsv.gz -c cell_label_color.tsv.gz --lle_components 4
411432
```

tutorial/1.STREAM_scRNA-seq.ipynb

Lines changed: 545 additions & 188 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)