awesome-evomics

curated list of awesome resources for evolutionary and population genomics [work in progress]

data sets

human

InternationalGenome.org - data portal for the 1000 Genomes and other projects
- data collections
- map of populations
Genome Projects at Max Planck Institute for Evolutionary Antropology - sequencing data for the Neanderthals and Denisovans
- http://cdna.eva.mpg.de/neandertal/ - direct downloads
Simons Genome Diversity Project (SGDP) - WGS data from 142 populations around the world
The Allen Ancient DNA Resource (AADR) - ancient and modern human samples sequenced using the 1240K SNP panel

zoology

Ultraconserved elements (UCEs) - resources for ultraconserved elements (UCEs), a useful set of genome-wide markers, especially for non-model taxa without reference genomes. The combination of conserved sequences with variable flanking regions offers markers to study evolution at different levels, from populations to phylogenomics at higher taxonomic ranks.
The Vertebrate Genomes Project - aims to sequence genomes for all known vertebrate species.

parasites

VEuPath database of eukaryotic pathogen, vector and host informatics
TriTryp database of trypanosomatid parasites

learning

Helpful tutorials, blogs, and books on topics in evomics, bioinformatics, and data science.

genomics

Speciation genomics - tutorials covering around 70% of my PhD, too bad I found the page after my defense
- their github includes example data, code, presentations, and other material
Evomics.org - portal with materials from years of summer schools on evolutionary genomics
The G-cat - genetic theory in nice digestible articles
Introduction to the Command Line for Genomics - a course by Data Carpentry
Population genetics and genomics in R - especially great for non-model taxa

data skills

Bioinformatics Data Skills - awesome book by Vince Buffalo
Data Science at the Command Line - great free book by Jeroen Janssens
Ad Hoc Data Analysis From The Unix Command Line - free book at Wikibooks

software tools

software repositories

Bioconda - channel of bioinformatic software, for the conda / mamba package managers
Conda-forge - channel of scientific software, for the conda / mamba package managers
Homebrew Bio - repository of bioinformatic software for the Homebrew / Linuxbrew package managers
Bioconductor - bioinformatic packages and versioned data in R

population & evolutionary genomics

MethodsPopGen.com - overview of software tools for population and evolutionary genomics, described in a review paper
PLINK2 - toolkit for population genomics and GWAS
EIGENSOFT - tools for analysis of populations, including population stratification and SmartPCA
ADMIXTOOLS2 - R package with reimplementation of the original ADMIXTOOLS, with higher performance and easy scripting interface, plus a GUI webapp
ADMIXTOOLS - the original ADMIXTOOLS package

simulations

msprime - coalescent simulator
SLiM - forward-time simulator for spatial models of evolution
slendr - R interface to msprime and SLiM simulators, with support for spatial and non-spatial models
stdpopsim - library of standard population genetic simulation models

bioinformatic formats

While genotype matrices are the dominant data type in evomics, other data types and formats appear as well - from FASTA reference sequences or alignments, to genomic features and annotations.

HTSlib - Umbrella project for Samtools and related packages
- Samtools for SAM/BAM alignments
- Bcftools for variant data in VCF/BCF formats
- Bgzip - Block gzip allows fast random access to compressed files
- Tabix - indexing of files compressed with Bgzip.
SeqKit - for efficient manipulation of FASTA/FASTQ formats
SeqTK - for efficient manipulation of FASTA/FASTQ formats
bioawk - extension of the AWK language with support for common bioinformatic formats and compressed data
Seqmagick - a kickass little utility built in the spirit of imagemagick to expose the file format conversion in Biopython in a convenient way. Instead of having a big mess of scripts, there is one that takes arguments.

tabular data

There is plenty of tabular data in bioinformatics, from the well-known formats to all kinds of metadata. Many tools were developed to process generic tabular data.

structured text tools - overview of tools for processing structured text
Miller - Miller is like awk, sed, cut, join, and sort for data formats such as CSV, TSV, JSON, JSON Lines, and positionally-indexed
csvtk - fast CSV/TSV toolkit in Go, with many features and simple plot functions
xsv - fast CSV/TSV toolkit in Rust
visidata - terminal spreadsheet app
grabix - like tabix but for non-bio data (indexing by line numbers instead of genomic positions); fast slicing / random sampling of large compressed tabular data

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

awesome-evomics

data sets

human

zoology

parasites

learning

genomics

data skills

software tools

software repositories

population & evolutionary genomics

simulations

bioinformatic formats

tabular data

About

Uh oh!

Releases

Packages

janxkoci/awesome-evomics

Folders and files

Latest commit

History

Repository files navigation

awesome-evomics

data sets

human

zoology

parasites

learning

genomics

data skills

software tools

software repositories

population & evolutionary genomics

simulations

bioinformatic formats

tabular data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages