-
Notifications
You must be signed in to change notification settings - Fork 0
Module : DGE with SCDE
This module calculate for each gene the probability of differential expression.
-
Internal name : scde
-
Avalaible : local mode
-
Input Ports :
- matrix : filtered count matrix (tsv)
- cells : filtered cells metadata (tsv)
- genes : genes metadata (tsv)
-
Output Ports :
- dgeoutput : differential expression result (tsv)
-
Optional parameters :
- Main Parameters
Parameter | Type | Description | Default Value |
---|---|---|---|
n.cores | int | Number of cores to use | 1 |
model.group.col | string | Name of the column if cells must be grouped for model fitting (NULL if no grouping is necessary) | NULL |
prior.length | int | Number of points for prior calculation | 400 |
batch.col | string | Name of the column indicating batches, if batch correction is required (else NULL) | NULL |
n.randomizations | int | Number of randomization for testing | 150 |
- Parameters for model fitting
Parameter | Type | Description | Default Value |
---|---|---|---|
min.observation | int | Minimal number of observations for a gene to be used for model fitting | 3 |
min.genes | int | Minimum number of genes for model fitting | 2000 |
threshold.segmentation | boolean | Use or not threshold segmentation to accelerate failure estimation | TRUE |
failure.threshold | int | Number of reads indicating a gene failed amplification | 4 |
max.pairs | int | Maximum number of comparisons that should be performed per group for estimation of dropout rate | 5000 |
min.pairs | int | Minimum number of comparisons that should be performed per group for estimation of dropout rate | 10 |
poisson.param | float | Parameter of the Poisson distribution used to model failures | 0.1 |
linear.fit | boolean | Weither to use linear fit for model fitting (highly recommanded) | TRUE |
min.theta | float | Minimum for the dispersion parameter of the negative binomial | 0.01 |
max.theta | float | Maximum for the dispersion parameter of the negative binomial | 100 |
- Parameters for prior calculation
Parameter | Type | Description | Default Value |
---|---|---|---|
save.prior.plot | boolean | Weither to save or not prior plot | TRUE |
pseudocount | int | Pseudocount to add to observation before log transforming them | 1 |
quantile | float | Quantile used to set maximum expression value | 0,999 |
max.value | float | Alternative to quantile, maximum expression value | NULL |
- Parameters for test
Parameter | Type | Description | Default Value |
---|---|---|---|
return.posteriors | boolean | Weither to return or not the posteriors | TRUE |
- Configuration example
<step id="DGE" skip="false">
<module>scde</module>
<parameters>
<parameter>
<name>prior.length</name>
<value>400</value>
</parameter>
<parameter>
<name>n.randomizations</name>
<value>200</value>
</parameter>
<parameter>
<name>n.cores</name>
<value>12</value>
</parameter>
</parameters>
</step>
In order to evaluate goodness of fit of the model, the module calculate the amount of variance from the measured values explained by the model (i.e. r-squared of a linear model where predictive value is the measured value and predicted value is the model value). The model is plotted on a goodness of fit plot :
Considering that model would be fitted for the majority of cells, we expect the distribution of this values to be "nearly Gaussian" :
Misfits are expected to be outliers showing lesser values (see distribution plot below), thus cells showing lesser value are removed one by one, until Shapiro's test returns sufficient probability under Gaussian assumption.
Shapiro's P-values show an increase of several order of magnitudes after some removals (see p-value as a function of number of removals below). This increase indicates "nearly Gaussian" distribution.
The module also plot goodness of fit as a function of Michaelis Menten model maximum, before and after outliers depletion. This allow for visual inspection of the process.
After cleaning data, the module produces two scatter plot, showing all cells in term of number of feature (y-axis) and number of reads (x-axis).
The first one, show all cells, the ones in red are those being eliminated.
The second one shows cells remaining after filtering. At the end of the filtering, cells should behave like a mixture of gaussian, i.e. you can wrap them in a given number of ellipses.