Example Box Plot Output for Bootstrap Predictions of Opsin λmax by OPTICS
- OPTICS is an open-source tool that predicts the Opsin Phenotype (λmax) from unaligned opsin amino-acid sequences.
- OPTICS leverages machine learning models trained on the Visual Physiology Opsin Database (VPOD).
- OPTICS can be downloaded and used as a command-line or GUI tool.
- OPTICS is also avaliable as an online tool here, hosted on our Galaxy Project server.
- λmax Prediction: Predicts the peak light absorption wavelength (λmax) for opsin proteins.
- Model Selection: Choose from different pre-trained models for prediction.
- Encoding Methods: Select between one-hot encoding or amino-acid property encoding for model training and prediction.
- BLAST Analysis: Optionally perform BLASTp analysis to compare query sequences against reference datasets.
- Bootstrap Predictions: Optionally enable bootstrap predictions for enhanced accuracy assessment (suggested limit to 10 sequences for bootstrap visulzations).
-
Clone the repository:
git clone https://github.com/VisualPhysiologyDB/optics.git
-
Install dependencies: [Make sure you are working in the repository directory from here-after]
A. Create a Conda environment for OPTICS (make sure you have Conda installed)
conda create --name optics_env python=3.11
conda activate optics_env
B. Use the 'requirements.txt' file to download base package dependencies for OPTICS
pip install -r requirements.txt
C. Download MAFFT and BLAST
IF working on MAC or LINUX device:
- Install BLAST and MAFFT directly from the bioconda channel
conda install bioconda::blast bioconda::mafft
IF working on WINDOWS device:
- Manaully install the Windows compatable BLAST executable on your system PATH; the download list is here
- We suggest downloading 'ncbi-blast-2.16.0+-win64.exe'
- You DO NOT need to download MAFFT, OPTICS should be able to run MAFFT from the files we provide when downloading this GitHub.
- Install BLAST and MAFFT directly from the bioconda channel
MAKE SURE YOU HAVE ALL DEPENDENCIES DOWNLOADED ARE IN THE FOLDER DIRECTORY FOR OPTICS BEFORE RUNNING ANY SCRIPTS!
Required Args:
-i, --input: Either a single sequence or a path to a FASTA file.
General Optional Args:
-o, --output_dir: Desired directory to save output folder/files (optional). Default: './prediction_outputs'
-p, --prediction_prefix: Base filename for prediction outputs (optional). Default: 'unnamed'
-v, --model_version: Version of models to use (optional). Based on the version of VPOD used to train models. Options/Default: vpod_1.3 (More version coming later)
-m, --model: Prediction model to use (optional). Options: whole-dataset, wildtype, vertebrate, invertebrate, wildtype-vert, type-one, whole-dataset-mnm, wildtype-mnm, vertebrate-mnm, invertebrate-mnm, wildtype-vert-mnm. **Default: whole-dataset**
-e, --encoding: Encoding method to use (optional). Options: one_hot, aa_prop. Default: aa_prop
BLASTp Analysis Args (optional):
--blastp: Enable BLASTp analysis.
--blastp_report: Filename for BLASTp report. Default: blastp_report.txt
--refseq: Reference sequence used for blastp analysis. Options: bovine, squid, microbe, custom. Default: bovine
--custom_ref_file: Path to a custom reference sequence file for BLASTp. Required if --refseq custom is selected.
Bootstrap Analysis Args (optional):
--bootstrap: Enable bootstrap predictions.
--visualize_bootstrap: Enable visualization of bootstrap predictions.
--bootstrap_viz_file: Filename prefix for bootstrap visualization. Default: bootstrap_viz
--save_viz_as: File type for bootstrap visualizations. Options: SVG, PNG, or PDF Default: SVG
--full_spectrum_xaxis: Enables visualization of predictions on a full spectrum x-axis (300-650nm). Otherwise, x-axis is scaled with predictions.
python optics_predictions.py -i ./examples/optics_ex_short.txt -o ex_test_of_optics -p ex_predictions -m wildtype -e aa_prop --blastp -blastp_report blastp_report.txt --refseq squid --bootstrap --visualize_bootstrap --bootstrap_viz_file bootstrap_viz --save_viz_as SVG
python run_optics_gui.py
- Unaligned FASTA file containing opsin amino-acid sequences.
- Example FASTA Entry:
>NP_001014890.1_rhodopsin_Bos_taurus MNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIMLGFPINFLTLYVTVQHKKLRT PLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVC KPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTPHEETNNESFVIYMFVV HFIIPLIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQG SDFGPIFMTIPAFFAKTSAVYNPVIYIMMNKQFRNCMVTTLCCGKNPLGDDEASTTVSKTETSQVAPA
-
Predictions (TSV): λmax values, model used, and encoding method.
-
BLAST Results (TXT, optional): Comparison of query sequences to reference datasets.
-
Bootstrap Graphs (PDF, optional): Visualization of bootstrap prediction results.
-
Job Log (TXT): Log file containing input command to OPTICS, including encoding method and model used.
Note - All outputs are written into sub-folders within the 'prediction_outputs' folder, and are marked by time and date.
All data and code is covered under a GNU General Public License (GPL)(Version 3), in accordance with Open Source Initiative (OSI)-policies
-
IF citing this GitHub and its contents use the following DOI provided by Zenodo...
10.5281/zenodo.10667840
-
IF you use OPTICS in your research, please cite the following paper: NOTE - We are currently working on a manuscript specific to OPTICS - so this citation will change in the near future.
Seth A. Frazer, Mahdi Baghbanzadeh, Ali Rahnavard, Keith A. Crandall, & Todd H Oakley. Discovering genotype-phenotype relationships with machine learning and the Visual Physiology Opsin Database (VPOD). GigaScience, 2024.09.01. https://doi.org/10.1093/gigascience/giae073
Contact information for author questions or feedback.
Todd H. Oakley - ORCID ID
oakley@ucsb.edu
Seth A. Frazer - ORCID ID
sethfrazer@ucsb.edu
-
Want to use OPTICS without the hassle of the setup? -> CLICK HERE to visit our Galaxy Project server and use our tool!
-
OPTICS v1.3 uses VPOD_v1.3 for training.
-
Here is a link to a bibliography of the publications used to build VPOD_v1.2 (VPOD_v1.3 version not yet released)
-
If you know of publications for training opsin ML models not included in the VPOD_v1.2 database, please send them to us through this form
-
Check out the VPOD GitHub repository to learn more about our database and ML models!