Skip to content

janxkoci/awesome-evomics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 

Repository files navigation

awesome-evomics

curated list of awesome resources for evolutionary and population genomics [work in progress]

data sets

human

zoology

  • Ultraconserved elements (UCEs) - resources for ultraconserved elements (UCEs), a useful set of genome-wide markers, especially for non-model taxa without reference genomes. The combination of conserved sequences with variable flanking regions offers markers to study evolution at different levels, from populations to phylogenomics at higher taxonomic ranks.
  • The Vertebrate Genomes Project - aims to sequence genomes for all known vertebrate species.

parasites

learning

Helpful tutorials, blogs, and books on topics in evomics, bioinformatics, and data science.

genomics

data skills

software tools

software repositories

population & evolutionary genomics

  • MethodsPopGen.com - overview of software tools for population and evolutionary genomics, described in a review paper
  • PLINK2 - toolkit for population genomics and GWAS
  • EIGENSOFT - tools for analysis of populations, including population stratification and SmartPCA
  • ADMIXTOOLS2 - R package with reimplementation of the original ADMIXTOOLS, with higher performance and easy scripting interface, plus a GUI webapp
  • ADMIXTOOLS - the original ADMIXTOOLS package

simulations

  • msprime - coalescent simulator
  • SLiM - forward-time simulator for spatial models of evolution
  • slendr - R interface to msprime and SLiM simulators, with support for spatial and non-spatial models
  • stdpopsim - library of standard population genetic simulation models

bioinformatic formats

While genotype matrices are the dominant data type in evomics, other data types and formats appear as well - from FASTA reference sequences or alignments, to genomic features and annotations.

  • HTSlib - Umbrella project for Samtools and related packages
    • Samtools for SAM/BAM alignments
    • Bcftools for variant data in VCF/BCF formats
    • Bgzip - Block gzip allows fast random access to compressed files
    • Tabix - indexing of files compressed with Bgzip.
  • SeqKit - for efficient manipulation of FASTA/FASTQ formats
  • SeqTK - for efficient manipulation of FASTA/FASTQ formats
  • bioawk - extension of the AWK language with support for common bioinformatic formats and compressed data
  • Seqmagick - a kickass little utility built in the spirit of imagemagick to expose the file format conversion in Biopython in a convenient way. Instead of having a big mess of scripts, there is one that takes arguments.

tabular data

There is plenty of tabular data in bioinformatics, from the well-known formats to all kinds of metadata. Many tools were developed to process generic tabular data.

  • structured text tools - overview of tools for processing structured text
  • Miller - Miller is like awk, sed, cut, join, and sort for data formats such as CSV, TSV, JSON, JSON Lines, and positionally-indexed
  • csvtk - fast CSV/TSV toolkit in Go, with many features and simple plot functions
  • xsv - fast CSV/TSV toolkit in Rust
  • visidata - terminal spreadsheet app
  • grabix - like tabix but for non-bio data (indexing by line numbers instead of genomic positions); fast slicing / random sampling of large compressed tabular data

About

awesome list of evolutionary genomics resources

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published