Skip to content

Filtering genes suitable for DEA should not use values adjusted for library size #1397

@arteymix

Description

@arteymix

The filtering that we do for DEA is very minimal and stringent and aim mainly at excluding repetitive values.

We exclude:

  • genes that have a variance of less than 1e-5
  • genes that are repeated in >30% of assays analyzed

When values are adjusted for library size, it introduces variance that makes genes with repetitive values evade our filters since they will be normalized to different values.

log2(1e6 * (X + 0.5) / (librarySize + 1))
=> log2(1e6) + log2(X + 0.5) - log2(librarySize + 1)

Simply adding log2(librarySize + 1) to the counts would eliminate the variance contribution from the library size and allow the filters to work properly.

This is very likely to address some of the P-value peaks we see in single-cell RNA-Seq that are caused by zeroes having different normalization.

Sub-issues

Metadata

Metadata

Labels

single cellIssues related to single-cell data support

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions