Skip to content

GLM_supervised

marekkokot edited this page Aug 4, 2025 · 2 revisions

How to run it

The Tu run GLM supervised test flag should be used as follows: --supervised_test_samplesheet. Here is the simple snippet with an example of how to run it for 10X data:\

git clone --recurse-submodules https://github.com/refresh-bio/SPLASH/
cd SPLASH
make -j
cd example
./download_10X.py
../bin/splash --technology 10x --supervised_test_samplesheet test_data_cell_barcode_samplesheet_GLM_supervised_test.tsv input-10X.txt

If compilation is not feasible, the above may be replaced with using a precompiled release (but the download link should be adjusted to the appropriate platform and version, check releases):

wget https://github.com/refresh-bio/SPLASH/releases/download/v2.11.6/splash-2.11.6.linux.x64.tar.gz
tar -xvf splash-2.11.6.linux.x64.tar.gz
cd example
./download_10X.py
../splash --technology 10x --supervised_test_samplesheet test_data_cell_barcode_samplesheet_GLM_supervised_test.tsv input-10X.txt

metadata file format:

non-10x data:

The only requirement for metadata file is that the first column and second columns must be for sample names and the assigned class/group/category/celltype to each sample. Also, the first row must be column names (column names can be anything). The two columns are subsequently renamed as "sample_name" and "group" in the script. Below is an example metadata file:

sample_name	type
SRR8788980	carcinoma
SRR8788981	melanoma
SRR8788982	lymphoma
SRR8788983	carcinoma
SRR8657060	carcinoma

10x data:

Meta data file for 10x data must have an extra column (second column) for cell barcodes. Other requirements are the same as non-10x data:

sample_name cell cell_type
S2	AAACCCAAGACACACG	AT2
S1	AAACCCAAGCCATGCC	NK
S1	AAACCCAAGGGAACAA	Mac
S4	AAACCCAAGGGTTGCA	NK

Output files:

The output directory for the GLM test is: directory/run_name_supervised_metadata/.

  • GLM_supervised_anchors.tsv: the output file containing anchors with non-zero GLM_coefficients whose largest GLM coefficient is greater than 1.
  • plots: this subfolder contains the plots for visualizing the anchors with metadata-dependent target fraction varaition. For each anchor, two plots will be generated in a pdf named as anchor sequence, a box plot for showing the fraction of target1 per sample grouped by the category and a scatterplot for showing the counts for target1 and target2 per sample colorcoded by the metadata category.

Contact

Please contact Roozbeh Dehghannasiri (rdehghan@stanford.edu).

Clone this wiki locally