EPIK is a program for rapid alignment-free phylogenetic placement, the successor of RAPPAS.
It is advised to install the package in a new environment, because our C++ dependencies are strict and may clash with other packages (requiring libboost in particular).
We also recommend to use mamba, which is faster in solving environment dependencies.
conda create -n epik
conda activate epik
conda config --set channel_priority strict
# If you use mamba:
# conda config set channel_priority strict
# note that we install both ipk (database creation) and epik (phylogenetic placement)
mamba install ipk epik
If you find conda slow and clumsy, consider the wonderful pixi manager:
pixi init -c conda-forge -c bioconda
pixi add epik ipk
pixi shell
And you're good to go.
If you want to get your hands dirty, follow these steps.
- Boost Libraries >=1.6
- CMake >= 3.10
- GCC compiler must support c++17
- zlib
- rapidjson
- click
On Debian-like systems they can be installed with:
sudo apt install build-essential cmake libboost-dev libboost-serialization-dev libboost-filesystem-dev libboost-iostreams-dev libboost-program-options-dev zlib1g-dev rapidjson-dev libquadmath0 python3-pip
pip3 install click
git clone --recursive https://github.com/phylo42/EPIK epik
cd epik && mkdir -p bin && cd bin
cmake ..
make -j4
You can use epik.py from the directory where it was built or install it system-wide or for a single user to make epik.py visible from any directory.
For a system-wide installation (requires elevated permissions):
sudo cmake --install .
Alternatively, to install for the current user, choose a directory where you want to install the tool. For instance, you might choose /home/$USER/opt or any other directory that you prefer. Replace DIRECTORY in the commands below with your chosen directory path:
cmake --install . --prefix DIRECTORY
export PATH=DIRECTORY/bin:$PATH
Remember to export the DIRECTORY/bin to your PATH. You can do this manually each time or add the export command to your shell initialization scripts (e.g., .bashrc).
Once you installed EPIK and activated your virtual environment with conda activate epik or pixi shell, run:
# get some test alignment and tree
wget https://github.com/phylo42/IPK/raw/refs/heads/main/tests/data/D652/reference.fasta
wget https://github.com/phylo42/IPK/raw/refs/heads/main/tests/data/D652/tree.rooted.newick
# build database with IPK : using 1 CPU and default phylogenetic model parameters
# a better approach would be to use appropriate parameters, see documentation
ipk.py build --refalign reference.fasta --reftree tree.rooted.newick --states nucl --workdir . --model GTR
# place with EPIK
epik.py place -i DB.ipk -s nucl -o . reference.fasta
# jplace results
cat placements_reference.fasta.jplace
# you can do post-analyses with the excellent 'gappa' package
# (available in bioconda too, see https://github.com/lczech/gappa)
To place queries to a phylogenetic tree, you need to first preprocess it with IPK and make a phylo-k-mer database (see here for detail). Queries should be in non-compressed fasta format. An example of placement command (see below for possible parameters values):
epik.py place -i DATABASE -s [nucl|amino] -o OUTPUT_DIR INPUT_FASTA
If EPIK is not installed, run ./epik.py from the EPIK directory instead.
| Option | Meaning | Default |
|---|---|---|
| -i | The path to the phylo-k-mer database to use for placement. | |
| -s | States, nucl for DNA and amino for proteins |
nucl |
| --omega | The user-defined threshold. Can be set higher than the one used when database was created. (If you are not sure, ignore this parameter.) | 1.5 |
| --mu | The proportion of the database to keep when filtering. Mutually exclusive with --max-ram. Should be a value in (0.0, 1.0] |
1.0 |
| --max-ram | The maximum amount of memory used to keep the database content. Mutually exclusive with --mu. Sets an approximate limit to EPIK's RAM consumption (i.e. the given limit might be exceeded but EPIK will consider it). Examples: 512, 256K, 42M, 4.2G. |
|
| --threads | Number of parallel threads used for placement. EPIK should be compiled with OpenMP support enabled, i.e. EPIK_OMP=ON. (If you compile as we recommend, it is enabled) |
1 |
Also, see epik.py place --help for information.
Code quality evaluation with softwipe [2]:
softwipe --cmake --cpp -x third-party,i2l/third-party,i2l/tests/catch2,i2l/examples --no-execution .
[1] Romashchenko, N., Linard, B., Pardi, F., & Rivals, E. (2023). EPIK: precise and scalable evolutionary placement with informative k-mers. Bioinformatics, 39(12), btad692. https://doi.org/10.1093/bioinformatics/btad692
[2] Zapletal, A., Höhler, D., Sinz, C., & Stamatakis, A. (2021). The SoftWipe tool and benchmark for assessing coding standards adherence of scientific software. Scientific reports, 11(1), 10015. https://doi.org/10.1038/s41598-021-89495-8