Skip to content

BesenbacherLab/ctDNAmer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ctDNAmer

Reference- and alignment-free method for ctDNA detection from whole-genome sequencing (WGS) data.

Description

ctDNAmer is a reference-free approach for ctDNA detection that finds tumor-specific somatic variation directly from unaligned sequencing data by identifying k-mers unique to the primary tumor sample. These k-mers are then used to detect ctDNA within raw cfDNA sequencing reads.

ctDNAmer leverages genome-wide information and is not limited to SNVs. Probabilistic modeling is used to estimate the circulating tumor fraction.

The method is built as a customizable snakemake workflow [1]. K-mer counting and k-mer set operations are done by KMC3 [2], probablistic models are implemented in STAN [3] and sampling is performed with the rstan package.

Detailed descripton of ctDNAmer can be found here.

Requirements

Software

ctDNAmer requires Snakemake 8.0.0 or above and uses conda for package management.

Input data

Detailed description of the required input data can be found here.

Patient data

From each patient, primary tumor WGS data (~30x), matched germline WGS data (~30x) and cfDNA WGS data (~30x) are required. Each patient should have at least a pre-treatment/baseline cfDNA sample available. Sample paths can be specified in samples.tsv and units.tsv configuration files.

Germline union

ctDNAmer applies an union of germline k-mer sets for enhanced separation of tumor and germline sequences. Germline samples that will be combined to a representative union can be specified in the samples_glu.tsv configuration file. This can include germline samples of the current target patient cohort or optionally, additional germline WGS data can be included for better representation of the germline information.

Empirical noise

A set of unmatched cfDNA samples is required for the estimation of the empirical noise distribution. The cfDNA samples can be specified in the configuration file donors.tsv. These samples can be cfDNA samples from healthy individuals or cfDNA samples from other patients not included in the current target patient cohort.

Usage

Step 1: Install snakemake

Snakemake is best to be installed via the Mamba package manager (a drop-in replacement for conda). If you have neither Conda nor Mamba, it can be installed via Mambaforge. For other options see here.

Given that Mamba is installed, run:

mamba create -c conda-forge -c bioconda --name snakemake 'snakemake>=8'

to install Snakemake in an isolated environment. If you need to use conda instead of mamba, --conda-frontend conda flag needs to be added to the snakemake commands given below.

Activate the environment via:

conda activate snakemake

Step 2: Clone this repo

Download and extract the repository:

git clone https://github.com/BesenbacherLab/ctDNAmer.git && cd ctDNAmer

Step 3: Configure workflow

Workflow confifuration

To specify the parameters for running ctDNAmer and the sample paths, modify the configuration files config.yaml, samples.tsv, units.tsv, samples_glu.tsv and donors.tsv according to your needs, following the explanations provided here.

Step 4: Run workflow

Cluster exection

For cluster execution of the workflow, the snakemake slurm executor plugin needs to be installed with pip install snakemake-executor-plugin-slurm. If the slurm plugin is not installed, the -e flag needs to be specified for the snakemake commands listed below.

The specifics for cluster execution should be defined in the workflow profile configuration file. An example workflow profile for slurm is provided here. To use the example profile, adjust the snakemake command line parameters to your needs. Importantly, a cluster account is specified in the example profile as an environment variable. To set the account name as an environment variable run export ACCOUNT_NAME=<your_account_name> or modify the profile config file to include your account name directly.

After you have activated the conda environment with snakemake, installed the slurm executor plugin and set the account name as an environment variable, you can test the workflow remote execution by performing a dry-run:

snakemake -n

To run the workflow for a new data set, use the --directory flag that specifies the path to the directory where the pipeline will be executed. The target directory needs to include a config folder with config.yaml and samples.tsv files, which specify the ctDNAmer parameters and paths to sample files that will be used during execution. You can execute the workflow with:

snakemake --directory "path/to/new/directory/"

The workflow profile that specifies the details for the cluster execution will be still automatically detected from the pipeline directory (workflow/profiles/default/config.yaml) even when the execution directory is changed. If you want to specify a new cluster execution profile as well, use the --workflow-profile flag:

snakemake --workflow-profile "path/to/workflow_profile/config.yaml"

For further options for local, cluster and cloud execution, see the snakemake docs.

Testing optimal unique tumor set size

We have found that a minimum unique tumor set size of 20 000 is needed for reliable TF estimation from samples with ~30x coverage and this is set as the default parameter value in ctDNAmer. Optionally, the a count filter test can be performed to confirm how large unique tumor set is required for reliable TF estimation in the user defined patient cohort.

Additional set of patients or a subset of the patients in the target cohort can be applied for testing the optimal size of the unique tumor set. The configuration file config_count_filter_test.yaml can be used to specify the parameters of the test. The patient data used for testing can be specified in thesamples_count_filter_test.tsv and units_count_filter_test.tsv configuration files.

At least two cfDNA samples are required per patient for testing: a ctDNA-positive pre-treatment/baseline cfDNA sample and a ctDNA-negative/post-treatment cfDNA sample. The count filter test runs TF estimation for unique tumor sets of different sizes and the minimum required unique tumor set size can be determined based on the ctDNA-positive and -negative samples TF estimates difference.

To run the count-filter test, indicate the respective Snakefile that implements it with the -s flag from the command line:

snakemake -s workflow/Snakefile_count_filter_test -n

Comparison with an alignment based tumor fraction estimates: calcuating the mean allele frequency of clonal SNVs

The subworkflow clonalSNVs_tracking implements ctDNA detection and TF estimation based on aligned WGS data. Tumor fraction is estimated as the mean cfDNA allele frequency of clonal SNVs idenified from aligned primary tumor data. See more here.

References

[1] F. Mölder et al., “Sustainable data analysis with Snakemake,” F1000Research, vol. 10, p. 33, Apr. 2021, doi: 10.12688/f1000research.29032.2.

[2] M. Kokot, M. Długosz, and S. Deorowicz, “KMC 3: counting and manipulating k-mer statistics,” Bioinformatics, vol. 33, no. 17, pp. 2759–2761, Sep. 2017, doi: 10.1093/bioinformatics/btx304.

[3] Stan Development Team, “Stan Modeling Language Users Guide and Reference Manual.” Accessed: Oct. 24, 2024. [Online]. Available: https://mc-stan.org

About

Snakemake workflow and codebase for ctDNA detection with k-mers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •