SynteractTurbo

This is a placeholder repository for the SynteractTurbo project. Currently, it contains some basic scripts to index and query SynteractTurbo outputs.

Usage example

Get the repository

git clone https://github.com/Gleghorn-Lab/SynteractTurbo.git
cd SynteractTurbo
pip install -r requirements.txt

Convert the npy to an indexable SQL database

python npy_to_sql.py --npy_path example_pred_file.npy --db_path example_sql_db.db

Examples querying the database

print("\nDatabase Statistics:")
stats = get_database_stats(db_path)
for key, value in stats.items():
    print(f"  {key}: {value}")

# Example queries
print("\n" + "="*50)
print("EXAMPLE USAGE:")
print("="*50)

# Get a sample protein for demonstration
conn = sqlite3.connect(db_path)
sample_protein = pd.read_sql_query('SELECT protein1 FROM protein_pairs LIMIT 1', conn)['protein1'].iloc[0]
conn.close()

print(f"\nExample 1: Query all pairs for protein '{sample_protein}':")
result = query_protein_pairs(db_path, sample_protein)
print(f"Found {len(result)} pairs involving this protein")
print(result.head())

min_score = 50
print(f"\nExample 2: Query pairs for '{sample_protein}' with score >= {min_score}:")
result_filtered = query_protein_pairs(db_path, sample_protein, min_score=min_score)
print(f"Found {len(result_filtered)} with score >= {min_score}")
if len(result_filtered) > 0:
    print(result_filtered.head())

Database Statistics:
  total_pairs: 100000
  unique_proteins: 18758
  min_score: -95
  max_score: 99
  avg_score: 38.43

==================================================
EXAMPLE USAGE:
==================================================

Example 1: Query all pairs for protein 'AVEAYGEFLCMFEENYPETLKRLFVVKAPKLFPVAYNLIKPFLSEDTRKKIMVLGANWKEVLLKHISPDQVPVEYGGTMTDPDGNPKCKSKINYGGDIPRKYYVRDQVKQQYEHSVQISRGSSHQVEYEILFPGCVLRWQFMSDGADVGFGIFLKTKMGERQRAGEMTEVLPNQRYNSHLVPEDGTLTCSDPGICYANEVGEAFRSLVPAAVVWLSYGVASSYVLADAIDKGKKAGEVPSPEAGRSARVTVAVVDTFVWQALASVAIPGFTINRVCAASLYVLGTATRWPLAVRKWTTTALGLLTIPIIIHPIDRSVDFLLDSSLRKLYPTVGKPSSS':
Found 3 pairs involving this protein
                                            protein1                                           protein2  score
0  AVEAYGEFLCMFEENYPETLKRLFVVKAPKLFPVAYNLIKPFLSED...  MNGHLEAEEQQDQRPDQELTGSWGHGPRSTLVRAKAMAPPPPPLAA...     52
1  MSLKNEPRVNTSALQKIAADMSNIIENLDTRELHFEGEEVDYDVSP...  AVEAYGEFLCMFEENYPETLKRLFVVKAPKLFPVAYNLIKPFLSED...     23
2  MGVLLTQRTLLSLVLALLFPSMASMAAIGSCSKEYRVLLGQLQKQT...  AVEAYGEFLCMFEENYPETLKRLFVVKAPKLFPVAYNLIKPFLSED...    -63

Example 2: Query pairs for 'AVEAYGEFLCMFEENYPETLKRLFVVKAPKLFPVAYNLIKPFLSEDTRKKIMVLGANWKEVLLKHISPDQVPVEYGGTMTDPDGNPKCKSKINYGGDIPRKYYVRDQVKQQYEHSVQISRGSSHQVEYEILFPGCVLRWQFMSDGADVGFGIFLKTKMGERQRAGEMTEVLPNQRYNSHLVPEDGTLTCSDPGICYANEVGEAFRSLVPAAVVWLSYGVASSYVLADAIDKGKKAGEVPSPEAGRSARVTVAVVDTFVWQALASVAIPGFTINRVCAASLYVLGTATRWPLAVRKWTTTALGLLTIPIIIHPIDRSVDFLLDSSLRKLYPTVGKPSSS' with score >= 50:
Found 1 with score >= 50
                                            protein1                                           protein2  score
0  AVEAYGEFLCMFEENYPETLKRLFVVKAPKLFPVAYNLIKPFLSED...  MNGHLEAEEQQDQRPDQELTGSWGHGPRSTLVRAKAMAPPPPPLAA...     52

Save query results to csv

result.to_csv('csv_path.csv', index=False)

Interpreting scores

Currently, scores range from -100 to 100, where -100 is a confident non-interacting protein pair and 100 is a confident interacting protein pair. However, the optimal threshold for binary predictions varies based on the model variant. A score of around 50 to threshold positive and negative predictions is typically optimal, but other models skew closer to 0. Reach out to Logan - lhallee@udel.edu - to get the optimal threshold for the checkpoint in question.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
csv_path.csv		csv_path.csv
example_pred_file.npy		example_pred_file.npy
npy_to_sql.py		npy_to_sql.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SynteractTurbo

Usage example

Get the repository

Convert the npy to an indexable SQL database

Examples querying the database

Save query results to csv

Interpreting scores

About

Uh oh!

Releases

Packages

Languages

License

Gleghorn-Lab/SynteractTurbo

Folders and files

Latest commit

History

Repository files navigation

SynteractTurbo

Usage example

Get the repository

Convert the npy to an indexable SQL database

Examples querying the database

Save query results to csv

Interpreting scores

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages