A Python-based tool for analyzing protein sequences and calculating various physicochemical properties. This tool uses BioPython's ProtParam module to analyze FASTA sequences and exports the results to an Excel file.
The tool calculates the following protein properties:
- Molecular Weight
- Amino Acid Composition
- Molar Extinction Coefficient
- Isoelectric Point
- Instability Index
- Aromaticity
- GRAVY (Grand Average of Hydropathy)
- Flexibility
- Python 3.x
- BioPython
- Pandas
- OpenPyXL
- Clone the repository:
git clone https://github.com/yourusername/protein-sequence-analysis.git
cd protein-sequence-analysis
- Install required packages:
pip install biopython pandas openpyxl
- Input file should be in FASTA format
- File should be named "Sequence.fasta"
- Protein sequence should contain valid amino acid letters
Example FASTA format:
>Protein_Name
MAEGEITTFTALTEKFNLPPGNYKKPKLLYCSNGGHFLRILPDGTVDGTRDRSDQHIQLQ
-
Place your FASTA file (named "Sequence.fasta") in the same directory as the script
-
Run the script:
python protein_analysis.py
- Check the output file "protein_feature_data.xlsx" for results
The script generates an Excel file ("protein_feature_data.xlsx") with the following columns:
- Molecular_Weight
- Amino_Acid_Count
- molar_extinction_coefficient
- isoelectric_point
- instability_index
- aromaticity
- Gravy
- Flexibility
# Read FASTA file
input_file = open("Sequence.fasta", "r")
for record in SeqIO.parse(input_file, "fasta"):
# Process sequence and calculate properties
my_sec = str(record.seq).rstrip('\\')
analyse = ProteinAnalysis(my_sec)
# Calculate various properties
mol_weight = analyse.molecular_weight()
count_amino = analyse.count_amino_acids()
# ... other calculations
You can enhance your analysis by visualizing the protein properties. Here are some example visualization codes using matplotlib and seaborn:
import matplotlib.pyplot as plt
import seaborn as sns
def plot_amino_acid_composition(count_amino):
plt.figure(figsize=(12, 6))
sns.barplot(x=list(count_amino.keys()), y=list(count_amino.values()))
plt.title('Amino Acid Composition')
plt.xlabel('Amino Acids')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('amino_acid_composition.png')
plt.close()
# Usage
plot_amino_acid_composition(count_amino)
def plot_flexibility_profile(flex):
plt.figure(figsize=(12, 6))
plt.plot(range(len(flex)), flex, '-b')
plt.title('Protein Flexibility Profile')
plt.xlabel('Position')
plt.ylabel('Flexibility Score')
plt.grid(True)
plt.tight_layout()
plt.savefig('flexibility_profile.png')
plt.close()
# Usage
plot_flexibility_profile(flex)
import numpy as np
def plot_property_radar(properties):
# Normalize the properties
normalized_props = {
'MW': properties['Molecular_Weight'] / 100000,
'pI': properties['isoelectric_point'] / 14,
'Instability': properties['instability_index'] / 100,
'Aromaticity': properties['aromaticity'],
'GRAVY': (properties['Gravy'] + 4) / 8 # Normalize between -4 and 4
}
categories = list(normalized_props.keys())
values = list(normalized_props.values())
angles = np.linspace(0, 2*np.pi, len(categories), endpoint=False)
values = np.concatenate((values, [values[0]]))
angles = np.concatenate((angles, [angles[0]]))
fig, ax = plt.subplots(figsize=(8, 8), subplot_kw=dict(projection='polar'))
ax.plot(angles, values)
ax.fill(angles, values, alpha=0.25)
ax.set_thetagrids(angles[:-1] * 180/np.pi, categories)
plt.title('Protein Properties Radar Chart')
plt.tight_layout()
plt.savefig('property_radar.png')
plt.close()
# Usage
properties = {
'Molecular_Weight': mol_weight,
'isoelectric_point': iso_point,
'instability_index': ist_index,
'aromaticity': aromati,
'Gravy': gra_vy
}
plot_property_radar(properties)
def create_analysis_dashboard(count_amino, flex, properties):
fig = plt.figure(figsize=(15, 10))
# Amino Acid Composition
ax1 = plt.subplot(221)
sns.barplot(x=list(count_amino.keys()), y=list(count_amino.values()), ax=ax1)
ax1.set_title('Amino Acid Composition')
ax1.set_xticklabels(ax1.get_xticklabels(), rotation=45)
# Flexibility Profile
ax2 = plt.subplot(222)
ax2.plot(range(len(flex)), flex, '-b')
ax2.set_title('Flexibility Profile')
# Property Radar
ax3 = plt.subplot(223, projection='polar')
categories = list(properties.keys())
values = list(properties.values())
angles = np.linspace(0, 2*np.pi, len(categories), endpoint=False)
ax3.plot(angles, values)
ax3.fill(angles, values, alpha=0.25)
ax3.set_title('Property Radar')
# Save dashboard
plt.tight_layout()
plt.savefig('protein_analysis_dashboard.png')
plt.close()
# Usage
create_analysis_dashboard(count_amino, flex, properties)
- Data Export for External Tools:
# Export data for external visualization
def export_visualization_data(count_amino, flex, properties):
# Create separate CSV files for each analysis
pd.DataFrame(count_amino.items(),
columns=['Amino_Acid', 'Count']).to_csv('amino_acid_data.csv')
pd.DataFrame({'Position': range(len(flex)),
'Flexibility': flex}).to_csv('flexibility_data.csv')
pd.DataFrame(properties, index=[0]).to_csv('properties_data.csv')
- Interactive Visualization with Plotly:
import plotly.express as px
import plotly.graph_objects as go
def create_interactive_plot(count_amino):
fig = px.bar(x=list(count_amino.keys()),
y=list(count_amino.values()),
title='Interactive Amino Acid Composition')
fig.write_html('interactive_plot.html')
Add these to your requirements.txt:
matplotlib>=3.5.0
seaborn>=0.11.0
plotly>=5.3.0 # For interactive plots
- Ensure your FASTA sequence contains only valid amino acid letters
- The tool processes one sequence at a time
- Large sequences may take longer to process
- Output will overwrite existing Excel file with the same name
Common issues and solutions:
-
Invalid Sequence Error:
- Ensure sequence contains only valid amino acid letters
- Remove any special characters or spaces
-
File Not Found Error:
- Check if "Sequence.fasta" exists in the correct directory
- Verify file name and extension
-
Excel File Access Error:
- Close the output Excel file before running the script
- Check write permissions in the directory
- Fork the repository
- Create your feature branch
- Commit your changes
- Push to the branch
- Create a new Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
For questions or support, please open an issue in the GitHub repository.
Made with β€οΈ for the bioinformatics community