SIGNAL: A high-throughput pipeline for large-scale analysis of microbial signal transduction systems

🚧 This project is under active development. Expect frequent changes.

SIGNAL (Systematic Investigation of Genomic Networks for Analysis of Logic-based signaling) is a pipeline designed to perform high-throughput analysis of bacterial and archaeal genomes to uncover patterns in signal transduction systems across genomes, taxonomic groups, and functional architectures. The current dataset includes 26,221 representative genomes.

System Requirements

Hardware

A standard computer with sufficient RAM for in-memory processing should be adequate.

Software

Python 3.6 or higher

Tested Operating Systems

Ubuntu 20.04
Linux Mint 20.2

Installation

Clone the repository using Git:

git clone https://github.com/ToshkaDev/signal-transduction.git

This will download the repository and set up the pipeline for use.

Usage

To launch the pipeline, use the master script:

cd signal-transduction
./analyze.sh

The script will first check whether the initial long-running step has already been completed by examining the presence of files in results/obtain_and_process_st/. Based on this, it will either start the entire pipeline or skip the completed steps.

Pipeline Steps

1. Preparation

Unpacks archived input files
Extracts genome lists (bacterial and archaeal)
Assigns genome sources (MiST genomes or MiST MAGs databases)
Creates all necessary input and output directories

Input files include:

A dataset of 26,221 bacterial and archaeal genomes. It is also possible to use your own list of genomes prepared in accordance with the format used.
Signal transduction domain definitions from the MiST (Microbial Signal Transduction) database
A metadata file from the Genome Taxonomy Database (GTDB, release r214)

2. ST Extraction and Processing (`obtain_and_process_st.py`)

Fetches signal transduction systems (two-component and one-component) from the MiST database using its API
Analyzes protein domain compositions and architectures
Outputs tabulated results listing:
- Genomes
- Histidine kinases (HKs), response regulators (RRs), and one-component systmes (OCP)
- Their protein domain compositions and architectures

3. Genome-Level Analysis (`analyze_st_per_genome.py`)

Analyzes and reports domain composition statistics for HKs, RRs, and OCPs per genome
Reports:
- Number and type of input domains in HKs and OCPs
- Additional domains in RRs
Normalizes statistics by genome size and total number of encoded proteins

4. Taxonomy-Level Analysis (`analyze_st_per_taxon.py`)

Analyzes domain composition statistics for HKs, RRs, and OCPs at each taxonomic level:
- Species
- Genus
- Family
- Order
- Class
- Phylum
- Kingdom
Normalizes results by the number of genomes per taxonomic level

The GTDB taxonomy is used.

Future Work

Visualization modules for domain architecture patterns

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
input		input
pipeline		pipeline
results/obtain_and_process_st		results/obtain_and_process_st
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
analyze.sh		analyze.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SIGNAL: A high-throughput pipeline for large-scale analysis of microbial signal transduction systems

System Requirements

Hardware

Software

Tested Operating Systems

Installation

Usage

Pipeline Steps

1. Preparation

2. ST Extraction and Processing (`obtain_and_process_st.py`)

3. Genome-Level Analysis (`analyze_st_per_genome.py`)

4. Taxonomy-Level Analysis (`analyze_st_per_taxon.py`)

Future Work

About

Uh oh!

Releases

Packages

Languages

License

ToshkaDev/signal-transduction

Folders and files

Latest commit

History

Repository files navigation

SIGNAL: A high-throughput pipeline for large-scale analysis of microbial signal transduction systems

System Requirements

Hardware

Software

Tested Operating Systems

Installation

Usage

Pipeline Steps

1. Preparation

2. ST Extraction and Processing (obtain_and_process_st.py)

3. Genome-Level Analysis (analyze_st_per_genome.py)

4. Taxonomy-Level Analysis (analyze_st_per_taxon.py)

Future Work

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

2. ST Extraction and Processing (`obtain_and_process_st.py`)

3. Genome-Level Analysis (`analyze_st_per_genome.py`)

4. Taxonomy-Level Analysis (`analyze_st_per_taxon.py`)

Packages