Skip to content

Installation and configuration

Bogdan Kirilenko edited this page Oct 4, 2023 · 4 revisions

TOGA is compatible with both Linux and MacOS systems, including M1-based machines. While it is highly recommended to have access to a computational cluster for larger tasks, small or partial genomes with short genes can be processed on a desktop PC.

TOGA operates properly with Python versions 3.9 and above.

The required Python packages are listed in the requirements.txt file and are also provided below:

  • twobitreader==3.1.7
  • networkx==3.1
  • pandas==2.0.2
  • numpy==1.24.3
  • xgboost==1.7.5
  • scikit-learn==1.2.2
  • joblib==1.2.0
  • h5py==3.8.0

Typically, these packages can be installed without any issues. However, if you encounter any difficulties with XGBoost, please refer to the Troubleshooting section (still under development; currently explained in the README.md).

Furthermore, TOGA requires nextflow to run parallel jobs.

You can install Nextflow using one of the following methods:

curl -fsSL https://get.nextflow.io | bash
# OR
conda install -c bioconda nextflow

If you've downloaded Nextflow using curl, move the Nextflow executable to a directory that's accessible via your $PATH variable. Please note that Nextflow requires Java >=8.

TOGA employs CESAR2.0 to generate structural orthologs annotations. There is no need for manual installation, as the configure.sh script will handle this process.

In addition, the script will train models for chain classification and compile all necessary binaries written in C.

Nextflow

To write about custom configurations. Also, about custom strategy class if nextflow does not fit (will be relevant for 1.1.5)

relevant issues: probably solved

issues that cannot be addressed directly: user's system limitations

System resources requirements - draft

Although TOGA may run on a machine with quite limited resources, it is recommended to use a relatively powerful cluster to perform computations in a reasonable amount of time.

From (https://github.com/hillerlab/TOGA/issues/96)

  • Can you please tell me what it would take computationally (capacity and time) to process human genome reference, and one bat genome through the toga process including Make Lastz Chains?
  • We need about 1000 CPU hours for the whole procedure. Best to have a compute cluster with a few hundred cores.