A portfolio of bioinformatics projects demonstrating my skills in Python programming, data analysis, and biological data processing.Code is kept private for confidentiality reasons
Welcome to my Python Public Portfolio! This repository is a collection of my bioinformatics projects where I apply Python to solve biological data challenges. These projects demonstrate my skills in Python programming, data analysis, biological data processing, and statistical modeling. Each project focuses on different aspects of bioinformatics, including protein sequence analysis, gene data retrieval, and more.
Although the source code for these projects is kept private due to confidentiality reasons, this repository includes detailed descriptions, project highlights, the tools and programs used, and example outputs. If you're interested in viewing the code or discussing my work, please feel free to contact me.
- Python programming for bioinformatics
- Biological sequence analysis and data manipulation
- Data retrieval from gene datasets and bioinformatics databases
- Advanced statistical analysis and error handling
- Automated testing for ensuring data accuracy
- Robust data validation techniques for bioinformatics workflows
- Python: Core language used for scripting and automation in all projects.
- Libraries/Modules:
argparse
: For command-line interface handling and input parsing.math
: For statistical calculations.os
andsys
: For file handling and operating system interactions.PyTest
: For automated unit testing of bioinformatics scripts.re
: For regular expression processing in parsing biological data.csv
andpandas
: For working with structured data (in some projects).- Bioinformatics Libraries: Libraries such as
Biopython
(used in some cases for advanced sequence analysis).
- Git: Version control and repository management.
- FASTA Format: Used for sequence data input and processing.
- Bioinformatics Data Sources: Data retrieved from public gene and protein databases such as UniProt and NCBI.
- Tools: Command-line tools and automated testing frameworks for validating results.
- Description: This project focuses on analyzing protein sequences to calculate their length and average molecular weight. Additionally, it includes dynamic protocol handling for laboratory settings, allowing adjustments based on user inputs.
- Tools/Programs: Python,
argparse
,math
, command-line interface for user interaction. - Skills Applied: Sequence analysis, dynamic protocol management, error handling.
- Example Use Cases:
- Calculating molecular weight for a specific protein sequence.
- Dynamically generating protocols for lab experiments based on user inputs for concentration and volume.
- Description: This project provides descriptive statistics (e.g., mean, median, variance, and standard deviation) for numerical data in tab-delimited files. The script handles missing and invalid values with error-checking mechanisms.
- Tools/Programs: Python,
math
,argparse
,csv
. - Skills Applied: Statistical analysis, data validation, error handling.
- Example Use Cases:
- Automatically generating statistical reports for large datasets with built-in validation and error handling.
- Ensuring missing data does not compromise the integrity of the statistical results.
- Description: This project processes FASTA files by splitting sequences and calculating nucleotide statistics, including nucleotide frequency and composition. Automated testing using PyTest ensures the accuracy of the scripts.
- Tools/Programs: Python,
argparse
,os
,sys
,PyTest
, Bioinformatics tools. - Skills Applied: Bioinformatics file processing (FASTA format), sequence analysis, automated testing.
- Example Use Cases:
- Extracting and analyzing specific sequences from large biological datasets.
- Verifying the reliability of sequence data processing pipelines using automated testing.
- Description: This project retrieves gene descriptions and processes gene data from various sources. The focus is on automating the retrieval and processing of gene-level information for large datasets, ensuring accurate file handling and data validation.
- Tools/Programs: Python,
os
,sys
,argparse
,re
, and potentiallypandas
for data manipulation. - Skills Applied: Gene data retrieval, file handling, and data processing.
- Example Use Cases:
- Automating the processing and querying of gene data for large-scale bioinformatics analysis.
- Handling complex gene data files with validation to ensure the accuracy of results.
- Description: This project analyzes gene data by counting gene categories and finding intersections between different datasets. It identifies common genes across datasets and performs automated validation to ensure accuracy.
- Tools/Programs: Python,
argparse
,os
,csv
,re
, and testing withPyTest
. - Skills Applied: Gene data analysis, set operations for finding intersections, automated testing for validating results.
- Example Use Cases:
- Identifying overlapping genes between datasets to uncover relationships or patterns.
- Analyzing gene categories and their occurrences across multiple datasets.
Due to confidentiality reasons, the source code for these projects is kept private. However, I am happy to provide access to the code upon request for review or collaboration purposes. If you're interested in viewing the code, learning more about the projects, or discussing potential opportunities, please feel free to contact me.
- LinkedIn: LinkedIn Profile
Thank you for taking the time to explore my portfolio! I look forward to connecting and discussing potential opportunities or collaborations.
This portfolio represents a collection of bioinformatics projects where I applied my knowledge of Python programming, data analysis, and biological data processing to solve complex challenges in the field of bioinformatics. As I continue to work on new projects, I will update this portfolio with additional examples and insights into my work. For access to specific projects or any further questions, please don’t hesitate to reach out!