Module : DGE with SCDE

This module calculate for each gene the probability of differential expression.

Internal name : scde
Avalaible : local mode
Input Ports :
- matrix : filtered count matrix (tsv)
- cells : filtered cells metadata (tsv)
- genes : genes metadata (tsv)
Output Ports :
- dgeoutput : differential expression result (tsv)
Optional parameters :
- Main Parameters

Parameter	Type	Description	Default Value
n.cores	int	Number of cores to use	1
model.group.col	string	Name of the column if cells must be grouped for model fitting (NULL if no grouping is necessary)	NULL
prior.length	int	Number of points for prior calculation	400
batch.col	string	Name of the column indicating batches, if batch correction is required (else NULL)	NULL
n.randomizations	int	Number of randomization for testing	150

Parameters for model fitting

Parameter	Type	Description	Default Value
min.observation	int	Minimal number of observations for a gene to be used for model fitting	3
min.genes	int	Minimum number of genes for model fitting	2000
threshold.segmentation	boolean	Use or not threshold segmentation to accelerate failure estimation	TRUE
failure.threshold	int	Number of reads indicating a gene failed amplification	4
max.pairs	int	Maximum number of comparisons that should be performed per group for estimation of dropout rate	5000
min.pairs	int	Minimum number of comparisons that should be performed per group for estimation of dropout rate	10
poisson.param	float	Parameter of the Poisson distribution used to model failures	0.1
linear.fit	boolean	Weither to use linear fit for model fitting (highly recommanded)	TRUE
min.theta	float	Minimum for the dispersion parameter of the negative binomial	0.01
max.theta	float	Maximum for the dispersion parameter of the negative binomial	100

Parameters for prior calculation

Parameter	Type	Description	Default Value
save.prior.plot	boolean	Weither to save or not prior plot	TRUE
pseudocount	int	Pseudocount to add to observation before log transforming them	1
quantile	float	Quantile used to set maximum expression value	0,999
max.value	float	Alternative to quantile, maximum expression value	NULL

Parameters for test

Parameter	Type	Description	Default Value
return.posteriors	boolean	Weither to return or not the posteriors	TRUE

Configuration example

<step id="DGE" skip="false">
	<module>scde</module>
	<parameters>
		<parameter>
			<name>prior.length</name>
			<value>400</value>	
		</parameter>
		<parameter>
			<name>n.randomizations</name>
			<value>200</value>	
		</parameter>
		<parameter>
			<name>n.cores</name>
			<value>12</value>	
		</parameter>
	  </parameters>
</step>

Interpreting output files

Introduction to scde

SCDE uses a mixture model : one distribution is for amplification failures, the other is for success ("real gene expression").

First step of the algorithm is to fit the model. Briefly, this steps estimate failure by comparing gene expression between cells, and then use success to fit a negative binomial (thus setting mean and dispersion parameter).

Second step calculates a prior distribution on mean expression, using all genes and all cells.

Third and last step realize a log fold change Bayesian testing. Its results are gathered in the output file. In this file :

lb column is the lower bound of the 95% HDI (analogous to 95% CI)
mle is the maximum likelihood estimate of the fold change
ub is the upper bound of the 95% HDI (analogous to 95% CI)
ce is the bound closer to 0, thus it is a kind of "corrected" estimate
Z is the quantile in a norm Gaussian distribution associated with the probability of the ratio being 0
cZ is the same as Z after correction by benjamini-hochberg procedure
cProbability is the probability associated to cZ

Evaluation of model fitting and misfit removal

In order to evaluate goodness of fit of the model, the module produces several plot for each cell :

FittingPlot

From left to rigth plots are :

scatter plot of observed count as a function of expected count
same plot with correlation and failures according to the model and 95% CI as a dashed line
probability of failure as a function of expected count
fit of the dispersion as a function of count

The easier way to find a badly fitted model is without a doubt to use the probability of failure plot :

fail

One of the model hypothesis is that failure is more likely to occur with few transcripts. So we expect to see high probabilities on the left of the plot, and low probabilities on its right. The left plot shows this kind of distribution, but the right one is nearly uniform indicating a problem with model fitting.

Design file format

This module requires a v2 design file. The design file must contain experiments with the columns attribute, indicating required columns for grouping the cells, and a comparisons attribute, listing and naming required comparisons.

The module can handle model attribute as for DESeq2, however it will not build a GLM, therefore it will uniquely test factors separately or one interaction of factors.

[Header]
DesignFormatVersion=2
Project=designExample
ProjectDescription=Example of design file for single cell DGE analysis
Owner=Geoffray Brelurut

GenomeFile=genome://mm10
GffFile=gff://mm10


[Experiments]
Exp.1.name=exp1
Exp.1.skip=False
Exp.1.columns=type+day
Exp.1.comparisons=WT1_vs_KO1:typeWT%dayday1_vs_typeKO%dayday1;\
WT2_vs_KO2:typeWT%dayday2_vs_typeKO%dayday2

Exp.2.name=exp2
Exp.2.skip=false
Exp.2.model=~type+day+type:day
Exp.2.comparisons=WT1_vs_KO1:typeWT%dayday1_vs_typeKO%dayday1;\
WT2_vs_KO2:typeWT%dayday2_vs_typeKO%dayday2

[Columns]
SampleId	SampleName	Reads		Condition	Reference	Exp.1.RepTechGroup Exp.2.type		Exp.2.day	Exp.2.RepTechGroup	Exp.3.Condition	Exp.3.RepTechGroup
s1a		Sample1a	sample1a.fastq	WT-day1		1		WT-day1			WT			day1		WT-day1			WT		WT-day1
s1b		Sample1b	sample1a.fastq	WT-day1		0		WT-day1			WT			day1		WT-day1			WT		WT-day1
s2a		Sample2a	sample2a.fastq	WT-day2		2		WT-day2			WT			day2		WT-day2			WT		WT-day2
s2b		Sample2b	sample2b.fastq	WT-day2		0		WT-day2			WT			day2		WT-day2			WT		WT-day2
s3a		Sample3a	sample3a.fastq	KO-day1		3		KO-day1			KO			day1		KO-day1			KO		KO-day1
s3b		Sample3b	sample3b.fastq	KO-day1		0		KO-day1			KO			day1		KO-day1			KO		KO-day1
s4a		Sample4a	sample4a.fastq	KO-day2		0		KO-day2			KO			day2		KO-day2			KO		KO-day2
s4b		Sample4b	sample4b.fastq	KO-day2		0		KO-day2			KO			day2		KO-day2			KO		KO-day2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Module : DGE with SCDE

Module : DGE with SCDE

Interpreting output files

Introduction to scde

Evaluation of model fitting and misfit removal

Design file format

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally