This repository hosts scripts used to generate and analyze the shared polymorphism dataset of Daphnia species.
Connor S. Murray
Email: csm6hg@virginia.edu
This directory contains the primary figures along with data and scripts used for plotting.
This directory contains supplemental figures, tables, data, and associated scripts.
This section includes the pipeline for generating Variant Call Format (VCF) files:
-
VCF Creation:
- Downloading short-read Illumina FASTQ files.
- Mapping to the European Daphnia pulex genome (D84A; GenBank assembly: GCA_023526725.1).
- BAM generation and merging.
- SNP calling using HaplotypeCaller and GATK.
- VCF and genomic data structure (GDS) generation.
-
VCF Filtration:
- Filtering to benchmark universal single-copy ortholog (BUSCO) genes.
- Multi-locus genotype (MLG) classification.
- SNP filtering based on recommendations for species lacking reference SNP panels.
-
Quality Control:
- Assessing missingness, quality scores, and coverage.
- MultiQC on BAMs.
-
Read-Based Phasing:
- WhatsHap phasing of samples and phased VCF creation.
-
LiftOver:
- Lifting over the European and North American Daphnia pulex genomes (KAP4; GenBank assembly: GCF_021134715.1).
- Repeat masking, single-copy orthologous genes, and genotype concordance.
To improve reproducibility and ease of deployment, we have added an Apptainer container that includes the necessary dependencies for running R and Python scripts in this repository.
If you do not have Apptainer installed, follow these steps to install it:
-
Install dependencies:
sudo apt-get update sudo apt-get install -y build-essential libseccomp-dev pkg-config squashfs-tools cryptsetup
-
Download and install Go:
wget https://dl.google.com/go/go1.16.5.linux-amd64.tar.gz sudo tar -C /usr/local -xzf go1.16.5.linux-amd64.tar.gz export PATH=$PATH:/usr/local/go/bin
-
Download the Apptainer source code:
git clone https://github.com/apptainer/apptainer.git cd apptainer
-
Compile and install Apptainer:
./mconfig make -C builddir sudo make -C builddir install
-
Clone this repository to your local machine:
git clone https://github.com/connor122721/SharedPolymorphismsDaphnia.git cd SharedPolymorphismsDaphnia
-
Build the Apptainer image:
apptainer build daphnia-container.sif Apptainer.def
- Start the container:
apptainer exec daphnia-container.sif R
- I would really love to make the feature of identifying trans-specific polymorphisms a package in R or as a standalone software. I am currently editing this feature so it is more user-friendly, but I would appreciate any collaboration or tips!
This project is licensed under the MIT License.