Skip to content

vincenzo-palmacci/SSP-Stat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SSP-Stat

Secondary Structure Prediction using GOR and SVM

SSP-Stat is a Python-based toolkit designed for protein secondary structure prediction. It implements both the GOR (Garnier–Osguthorpe–Robson) statistical method and Support Vector Machine (SVM) classifiers to predict secondary structures from amino acid sequences.


Features

  • GOR Implementation: Utilizes information theory-based statistical methods for secondary structure prediction.
  • SVM Classifiers: Employs machine learning techniques for enhanced prediction accuracy.
  • Dataset Preparation: Tools to process and prepare datasets for training and evaluation.
  • Performance Evaluation: Includes metrics such as Q3 accuracy and Segment Overlap (SOV) scores.

Installation

  1. Clone the Repository:

    git clone https://github.com/vincenzo-palmacci/SSP-Stat.git
    cd SSP-Stat
  2. Set Up a Virtual Environment (Optional but Recommended):

    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install Dependencies:

    pip install -r requirements.txt

Usage

1. Prepare the Dataset

Use the make-dataset.py script to process raw protein sequence data into a format suitable for training and evaluation.

python make-dataset.py --input raw_data.fasta --output processed_dataset.pkl

2. Train the GOR Model

Navigate to the gor-tools directory and execute the training script:

cd gor-tools
python train_gor.py --dataset ../processed_dataset.pkl --model gor_model.pkl

3. Train the SVM Model

Navigate to the svm-tools directory and execute the training script:

cd ../svm-tools
python train_svm.py --dataset ../processed_dataset.pkl --model svm_model.pkl

4. Evaluate Model Performance

Use the performance.py script to assess the accuracy of your models:

python performance.py --model gor_model.pkl --test_data test_dataset.pkl

This will output metrics such as Q3 accuracy and SOV scores.


Project Structure

SSP-Stat/
├── gor-tools/           # GOR model implementation
├── svm-tools/           # SVM model implementation
├── stats/               # Statistical analysis tools
├── Dataset.py           # Dataset handling class
├── make-dataset.py      # Script for dataset preparation
├── performance.py       # Performance evaluation script
├── sov.py               # SOV score calculation
├── requirements.txt     # Python dependencies
└── README.md            # Project documentation

Requirements

  • Python 3.6 or higher
  • Required Python packages listed in requirements.txt

About

Secondary Structure prediction with GOR and SVM

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages