-
Couldn't load subscription status.
- Fork 13
GLM_supervised
The Tu run GLM supervised test flag should be used as follows: --supervised_test_samplesheet.
Here is the simple snippet with an example of how to run it for 10X data:\
git clone --recurse-submodules https://github.com/refresh-bio/SPLASH/
cd SPLASH
make -j
cd example
./download_10X.py
../bin/splash --technology 10x --supervised_test_samplesheet test_data_cell_barcode_samplesheet_GLM_supervised_test.tsv input-10X.txt
If compilation is not feasible, the above may be replaced with using a precompiled release (but the download link should be adjusted to the appropriate platform and version, check releases):
wget https://github.com/refresh-bio/SPLASH/releases/download/v2.11.6/splash-2.11.6.linux.x64.tar.gz
tar -xvf splash-2.11.6.linux.x64.tar.gz
cd example
./download_10X.py
../splash --technology 10x --supervised_test_samplesheet test_data_cell_barcode_samplesheet_GLM_supervised_test.tsv input-10X.txt
The only requirement for metadata file is that the first column and second columns must be for sample names and the assigned class/group/category/celltype to each sample. Also, the first row must be column names (column names can be anything). The two columns are subsequently renamed as "sample_name" and "group" in the script. Below is an example metadata file:
sample_name type
SRR8788980 carcinoma
SRR8788981 melanoma
SRR8788982 lymphoma
SRR8788983 carcinoma
SRR8657060 carcinoma
Meta data file for 10x data must have an extra column (second column) for cell barcodes. Other requirements are the same as non-10x data:
sample_name cell cell_type
S2 AAACCCAAGACACACG AT2
S1 AAACCCAAGCCATGCC NK
S1 AAACCCAAGGGAACAA Mac
S4 AAACCCAAGGGTTGCA NK
The output directory for the GLM test is: directory/run_name_supervised_metadata/.
-
GLM_supervised_anchors.tsv: the output file containing anchors with non-zero GLM_coefficients whose largest GLM coefficient is greater than 1. -
plots: this subfolder contains the plots for visualizing the anchors with metadata-dependent target fraction varaition. For each anchor, two plots will be generated in a pdf named as anchor sequence, a box plot for showing the fraction of target1 per sample grouped by the category and a scatterplot for showing the counts for target1 and target2 per sample colorcoded by the metadata category.
Please contact Roozbeh Dehghannasiri (rdehghan@stanford.edu).