Skip to content

tahiri-lab/set-completion-4-supertrees

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Phylogenetic tree set completion in the context of supertrees

This project introduces an innovative method for completing a set of phylogenetic trees by minimizing the addition of new taxa while maintaining key evolutionary signals. Building on existing tree completion techniques, it incorporates consensus maximal completion subtrees and combined branch length data to enhance the process. This ensures that the completion is biologically relevant. Each tree in the set is iteratively completed by adding missing taxa using consensus subtrees derived from overlapping information in the other trees. The outcome is a collection of completed trees that serve as a foundation for creating multiple comprehensive consensus trees, which provide a cluster-based depiction of alternative supertrees.

Please note that this project is under development. Features and functionalities are subject to change, and the tool may undergo significant updates. Use it at your own discretion, and feel free to contribute or provide feedback to help improve its capabilities.

Features

  • Preserves evolutionary signals: Maintains key evolutionary relationships and branch lengths.
  • Consensus-based approach: Utilizes consensus maximal completion subtrees and aggregated branch lengths.
  • Biologically meaningful: Ensures the completion process is both relevant and computationally optimized.
  • Completed trees: Generates completed trees as intermediates for constructing multiple consensus supertrees.

Usage

Input Format

  • Input folder: Place your input phylogenetic tree multisets in the input_multisets directory.
  • File naming: Each multiset should be in a separate text file named multiset_X.txt, where X is a unique identifier (e.g., multiset_1.txt).
  • Tree format: Each line in the input file should contain a single phylogenetic tree in Newick format.

Running the Script

The main script is tree_set_completion.py. To execute the script:

  1. Ensure input files are in place

    Place all your multiset files in the input_multisets directory.

  2. Run the script

    python tree_set_completion.py

    Note: The script is configured to process files named multiset_X.txt in the input_multisets folder and output completed trees to the completed_multisets folder as completed_multiset_X.txt.

Output Format

  • Output folder: Completed phylogenetic tree multisets are saved in the completed_multisets directory.
  • File naming: Each completed multiset file corresponds to its input, named completed_multiset_X.txt.
  • Tree format: Each line in the output file contains a completed phylogenetic tree in Newick format.

Dependencies

  • Python 3.6 or higher
  • ETE3: For phylogenetic tree manipulation.
  • NumPy: For numerical computations.

Install dependencies using:

pip install ete3 numpy

Additional dependencies may be required based on future updates.

License

This project is licensed under the MIT License.

About

A tool for completing sets of phylogenetic trees in the context of constructing alternative supertrees

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages