About

The pipeline gets samples in fasta format and provides whole genome alignment of desired samples on specified reference.

Obtained alignment could demonstrate interesting rearrangements on the chromosome level.

The resulting plot of alignment you can find in data_output folder.

Used tools:

Configure environment

git clone https://github.com/rutaolta/chromodifpipe.git

cd <pipeline_working_dir>

It is recommended to create a fresh conda environment using mamba or conda.

mamba env create --name chromodifpipe --file ./environment.yaml
# or:
# conda env create --name chromodifpipe --file ./environment.yaml

Activate conda environment with snakemake:

conda activate chromodifpipe

Before run

Before running the pipeline you should add whitelist of scaffolds you are interested in.

Scaffold length report

To check scaffold length please use following command. That can be useful when choosing scaffolds length boundary to generate whitelists. The generated reports will be put in data_input/reports folder.

snakemake -pr --use-conda --cores 1 scaffold_length

Generate whitelists

To generate whitelists of scaffolds please use following command. This step is required for alignment plot.

Whitelists would be generated for each sample using boundary. Boundaries should be added into config/default.yaml parameter boundary.

The generated whitelists you can find in data_input/[name of your reference]/whitelists folder.

snakemake -pr --use-conda --cores 1 generate_whitelists

Scaffolds of sample will appear on plot in order that they appear in the .whitelist.

You could also add synonyms of scaffolds in the second column using Tab between columns.

The upper directory is named by the reference for you to be able to change the order of scaffolds if you will mind to use another reference.

Test

There are 2 yeast samples in data_input/samples folder for pipeline test. For test data config/default.yaml has been modified as following:

samples_dir defined as data_input/samples
reference specified as a cerevisiae
species for RepeatMasker specified as saccharomyces (The species name must be a valid NCBI Taxonomy Database species name and be contained in the RepeatMasker repeat database)
boundary for whitelist generation is set to 1000000
filter_range for filtering "noise" is set to 1
plot_original is true to generate also plot without filtering

To check the output you can run:

snakemake --cores 8 --configfile config/default.yaml --use-conda --profile profile/slurm/ --printshellcmds --latency-wait 60

and find results of test run in data_input folder

Run

REMINDER: before run pipeline on your data your should specify settings in config/default.yaml corresponding to your samples. As an example you can have a look at test parameters above.

After all settings were specified you can run the pipeline with following command:

snakemake --cores 8 --configfile config/default.yaml --use-conda --profile profile/slurm/ --printshellcmds --latency-wait 60

Additional information

Typically output plots with alignment should be filtered from "noise" that could interfere analysis results. Filter boundary could be defined in config/default.yaml parameter filter_range. Moreover if non-filtered plots look good and you only need to redraw filtered plots, you can skip drawing plot with originals to speed up. If you set parameter plot_original to false the pipe will redraw only plots based on filtered data.

And if you need to rerun plot step, for example if you added synonyms or changed the order of scaffolds/chromosomes in whitelist.txt, then you should remove or rename somehow following files with old results data_output/mavr/filtered_<name of your reference>.png.

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
cluster_logs		cluster_logs
config		config
data_input		data_input
data_output		data_output
logs		logs
profile/slurm		profile/slurm
workflow		workflow
.gitignore		.gitignore
README.md		README.md
Snakefile		Snakefile
dag.png		dag.png
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Configure environment

Before run

Scaffold length report

Generate whitelists

Test

Run

Additional information

About

Uh oh!

Releases 1

Packages

Contributors 2

Uh oh!

Languages

rutaolta/chromodifpipe

Folders and files

Latest commit

History

Repository files navigation

About

Configure environment

Before run

Scaffold length report

Generate whitelists

Test

Run

Additional information

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Languages

Packages