Opsin Phenotype Tool for Inference of Color Sensitivity (OPTICS) [v1.3]

Code: Data: VPOD_v1.2 DOI:

Opsin Phenotype Tool for Inference of Color Sensitivity (OPTICS) [v1.3]

Example Box Plot Output for Bootstrap Predictions of Opsin λmax by OPTICS

Description

OPTICS is an open-source tool that predicts the Opsin Phenotype (λmax) from unaligned opsin amino-acid sequences.
OPTICS leverages machine learning models trained on the Visual Physiology Opsin Database (VPOD).
OPTICS can be downloaded and used as a command-line or GUI tool.
OPTICS is also avaliable as an online tool here, hosted on our Galaxy Project server.

Key Features

λmax Prediction: Predicts the peak light absorption wavelength (λmax) for opsin proteins.
Model Selection: Choose from different pre-trained models for prediction.
Encoding Methods: Select between one-hot encoding or amino-acid property encoding for model training and prediction.
BLAST Analysis: Optionally perform BLASTp analysis to compare query sequences against reference datasets.
Bootstrap Predictions: Optionally enable bootstrap predictions for enhanced accuracy assessment (suggested limit to 10 sequences for bootstrap visulzations).

Installation

Clone the repository:

 git clone https://github.com/VisualPhysiologyDB/optics.git

Install dependencies: [Make sure you are working in the repository directory from here-after]

A. Create a Conda environment for OPTICS (make sure you have Conda installed)
```
conda create --name optics_env python=3.11 
```
THEN
```
conda activate optics_env
```
B. Use the 'requirements.txt' file to download base package dependencies for OPTICS
```
pip install -r requirements.txt
```
C. Download MAFFT and BLAST

IF working on MAC or LINUX device:
- Install BLAST and MAFFT directly from the bioconda channel
```
conda install bioconda::blast bioconda::mafft
```
IF working on WINDOWS device:
- Manaully install the Windows compatable BLAST executable on your system PATH; the download list is here
  - We suggest downloading 'ncbi-blast-2.16.0+-win64.exe'
- You DO NOT need to download MAFFT, OPTICS should be able to run MAFFT from the files we provide when downloading this GitHub.

Usage

MAKE SURE YOU HAVE ALL DEPENDENCIES DOWNLOADED ARE IN THE FOLDER DIRECTORY FOR OPTICS BEFORE RUNNING ANY SCRIPTS!

Parameters

Required Args:

-i, --input: Either a single sequence or a path to a FASTA file.

General Optional Args:

-o, --output_dir: Desired directory to save output folder/files (optional). Default: './prediction_outputs'

-p, --prediction_prefix: Base filename for prediction outputs (optional). Default: 'unnamed'

-v, --model_version: Version of models to use (optional). Based on the version of VPOD used to train models. Options/Default: vpod_1.3 (More version coming later)

-m, --model: Prediction model to use (optional). Options: whole-dataset, wildtype, vertebrate, invertebrate, wildtype-vert, type-one, whole-dataset-mnm, wildtype-mnm, vertebrate-mnm, invertebrate-mnm, wildtype-vert-mnm. **Default: whole-dataset** 

-e, --encoding: Encoding method to use (optional). Options: one_hot, aa_prop. Default: aa_prop

BLASTp Analysis Args (optional):

--blastp: Enable BLASTp analysis.

--blastp_report: Filename for BLASTp report. Default: blastp_report.txt

--refseq: Reference sequence used for blastp analysis. Options: bovine, squid, microbe, custom. Default: bovine

--custom_ref_file: Path to a custom reference sequence file for BLASTp.  Required if --refseq custom is selected.

Bootstrap Analysis Args (optional):

--bootstrap: Enable bootstrap predictions.

--visualize_bootstrap: Enable visualization of bootstrap predictions.

--bootstrap_viz_file: Filename prefix for bootstrap visualization. Default: bootstrap_viz

--save_viz_as: File type for bootstrap visualizations. Options: SVG, PNG, or PDF Default: SVG

--full_spectrum_xaxis: Enables visualization of predictions on a full spectrum x-axis (300-650nm). Otherwise, x-axis is scaled with predictions.

Example Command Line Usage vvv

python optics_predictions.py -i ./examples/optics_ex_short.txt -o ex_test_of_optics -p ex_predictions -m wildtype -e aa_prop --blastp -blastp_report blastp_report.txt --refseq squid --bootstrap --visualize_bootstrap --bootstrap_viz_file bootstrap_viz --save_viz_as SVG

Example GUI Usage vvv

python run_optics_gui.py

Input

Unaligned FASTA file containing opsin amino-acid sequences.

Example FASTA Entry:

  >NP_001014890.1_rhodopsin_Bos_taurus
  MNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIMLGFPINFLTLYVTVQHKKLRT 
  PLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVC 
  KPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTPHEETNNESFVIYMFVV 
  HFIIPLIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQG 
  SDFGPIFMTIPAFFAKTSAVYNPVIYIMMNKQFRNCMVTTLCCGKNPLGDDEASTTVSKTETSQVAPA

Output

Predictions (TSV): λmax values, model used, and encoding method.
BLAST Results (TXT, optional): Comparison of query sequences to reference datasets.
Bootstrap Graphs (PDF, optional): Visualization of bootstrap prediction results.
Job Log (TXT): Log file containing input command to OPTICS, including encoding method and model used.

Note - All outputs are written into sub-folders within the 'prediction_outputs' folder, and are marked by time and date.

License

All data and code is covered under a GNU General Public License (GPL)(Version 3), in accordance with Open Source Initiative (OSI)-policies

Citation

IF citing this GitHub and its contents use the following DOI provided by Zenodo...
```
10.5281/zenodo.10667840
```

IF you use OPTICS in your research, please cite the following paper: NOTE - We are currently working on a manuscript specific to OPTICS - so this citation will change in the near future.

Seth A. Frazer, Mahdi Baghbanzadeh, Ali Rahnavard, Keith A. Crandall, & Todd H Oakley. Discovering genotype-phenotype relationships with machine learning and the Visual Physiology Opsin Database (VPOD). GigaScience, 2024.09.01. https://doi.org/10.1093/gigascience/giae073

Contact

Contact information for author questions or feedback.

Todd H. Oakley - ORCID ID

oakley@ucsb.edu

Seth A. Frazer - ORCID ID

sethfrazer@ucsb.edu

Additional Notes/Resources

Want to use OPTICS without the hassle of the setup? -> CLICK HERE to visit our Galaxy Project server and use our tool!
OPTICS v1.3 uses VPOD_v1.3 for training.
Here is a link to a bibliography of the publications used to build VPOD_v1.2 (VPOD_v1.3 version not yet released)
If you know of publications for training opsin ML models not included in the VPOD_v1.2 database, please send them to us through this form
Check out the VPOD GitHub repository to learn more about our database and ML models!

Name		Name	Last commit message	Last commit date
Latest commit History 174 Commits
data		data
deepBreaks		deepBreaks
examples		examples
models		models
optics_scripts		optics_scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
AUTHORS.txt		AUTHORS.txt
LICENSE.txt		LICENSE.txt
README.md		README.md
example_command.txt		example_command.txt
optics_predictions.py		optics_predictions.py
requirements.txt		requirements.txt
run_optics_gui.py		run_optics_gui.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Opsin Phenotype Tool for Inference of Color Sensitivity (OPTICS) [v1.3]

Description

Key Features

Installation

THEN

Usage

Parameters

Example Command Line Usage vvv

Example GUI Usage vvv

Input

Output

License

Citation

Contact

Additional Notes/Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

VisualPhysiologyDB/optics

Folders and files

Latest commit

History

Repository files navigation

Opsin Phenotype Tool for Inference of Color Sensitivity (OPTICS) [v1.3]

Description

Key Features

Installation

THEN

Usage

Parameters

Example Command Line Usage vvv

Example GUI Usage vvv

Input

Output

License

Citation

Contact

Additional Notes/Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages