Secondary Structure Prediction using GOR and SVM
SSP-Stat is a Python-based toolkit designed for protein secondary structure prediction. It implements both the GOR (Garnier–Osguthorpe–Robson) statistical method and Support Vector Machine (SVM) classifiers to predict secondary structures from amino acid sequences.
- GOR Implementation: Utilizes information theory-based statistical methods for secondary structure prediction.
- SVM Classifiers: Employs machine learning techniques for enhanced prediction accuracy.
- Dataset Preparation: Tools to process and prepare datasets for training and evaluation.
- Performance Evaluation: Includes metrics such as Q3 accuracy and Segment Overlap (SOV) scores.
-
Clone the Repository:
git clone https://github.com/vincenzo-palmacci/SSP-Stat.git cd SSP-Stat
-
Set Up a Virtual Environment (Optional but Recommended):
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Dependencies:
pip install -r requirements.txt
Use the make-dataset.py
script to process raw protein sequence data into a format suitable for training and evaluation.
python make-dataset.py --input raw_data.fasta --output processed_dataset.pkl
Navigate to the gor-tools
directory and execute the training script:
cd gor-tools
python train_gor.py --dataset ../processed_dataset.pkl --model gor_model.pkl
Navigate to the svm-tools
directory and execute the training script:
cd ../svm-tools
python train_svm.py --dataset ../processed_dataset.pkl --model svm_model.pkl
Use the performance.py
script to assess the accuracy of your models:
python performance.py --model gor_model.pkl --test_data test_dataset.pkl
This will output metrics such as Q3 accuracy and SOV scores.
SSP-Stat/
├── gor-tools/ # GOR model implementation
├── svm-tools/ # SVM model implementation
├── stats/ # Statistical analysis tools
├── Dataset.py # Dataset handling class
├── make-dataset.py # Script for dataset preparation
├── performance.py # Performance evaluation script
├── sov.py # SOV score calculation
├── requirements.txt # Python dependencies
└── README.md # Project documentation
- Python 3.6 or higher
- Required Python packages listed in
requirements.txt