Skip to content

ToshkaDev/signal-transduction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SIGNAL: A high-throughput pipeline for large-scale analysis of microbial signal transduction systems

License Python Shell Repo Size Contributions welcome

🚧 This project is under active development. Expect frequent changes.

SIGNAL (Systematic Investigation of Genomic Networks for Analysis of Logic-based signaling) is a pipeline designed to perform high-throughput analysis of bacterial and archaeal genomes to uncover patterns in signal transduction systems across genomes, taxonomic groups, and functional architectures. The current dataset includes 26,221 representative genomes.


System Requirements

Hardware

A standard computer with sufficient RAM for in-memory processing should be adequate.

Software

  • Python 3.6 or higher

Tested Operating Systems

  • Ubuntu 20.04
  • Linux Mint 20.2

Installation

Clone the repository using Git:

git clone https://github.com/ToshkaDev/signal-transduction.git

This will download the repository and set up the pipeline for use.

Usage

To launch the pipeline, use the master script:

cd signal-transduction
./analyze.sh

The script will first check whether the initial long-running step has already been completed by examining the presence of files in results/obtain_and_process_st/. Based on this, it will either start the entire pipeline or skip the completed steps.

Pipeline Steps

1. Preparation

  • Unpacks archived input files
  • Extracts genome lists (bacterial and archaeal)
  • Assigns genome sources (MiST genomes or MiST MAGs databases)
  • Creates all necessary input and output directories

Input files include:

2. ST Extraction and Processing (obtain_and_process_st.py)

  • Fetches signal transduction systems (two-component and one-component) from the MiST database using its API
  • Analyzes protein domain compositions and architectures
  • Outputs tabulated results listing:
    • Genomes
    • Histidine kinases (HKs), response regulators (RRs), and one-component systmes (OCP)
    • Their protein domain compositions and architectures

3. Genome-Level Analysis (analyze_st_per_genome.py)

  • Analyzes and reports domain composition statistics for HKs, RRs, and OCPs per genome
  • Reports:
    • Number and type of input domains in HKs and OCPs
    • Additional domains in RRs
  • Normalizes statistics by genome size and total number of encoded proteins

4. Taxonomy-Level Analysis (analyze_st_per_taxon.py)

  • Analyzes domain composition statistics for HKs, RRs, and OCPs at each taxonomic level:
    • Species
    • Genus
    • Family
    • Order
    • Class
    • Phylum
    • Kingdom
  • Normalizes results by the number of genomes per taxonomic level

The GTDB taxonomy is used.

Future Work

  • Visualization modules for domain architecture patterns

About

Large-scale analysis of signal transduction systems

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published