👩‍🔬 About Me

I'm Virginia, a data scientist with a background in chemical engineering. Turning data into insights, models, and beautiful dashboards is my thing.

🐍 Python enthusiast focused on solving real-world problems through data.
🤖 Currently leveling up in machine learning and deep learning. Always exploring new techniques, always building something cool.
🎨 Big on data visualization. Passionate about creating visually engaging content.
📚 I document everything because good science and code should always be clear and reusable.

🧰 Tech Stack

Here are the tools and libraries I use regularly to explore data, build models, and tell compelling stories:

Languages & Frameworks
Python · Scikit-learn · Keras · TensorFlow · PyTorch · NumPy · Pandas
Data Visualization
Matplotlib · Seaborn · Power BI · Dashboards
Databases & Querying
SQL · MySQL
Tools & Environment
Jupyter Notebooks · VS Code · Git · GitHub, TensorBoard

🚀 Portfolio Highlights

Welcome to my digital lab! Below, you’ll find a collection of my data science projects where I dive into data, build models, and apply algorithms to solve real-world problems.

🧐 Exploratory Data Analysis (EDA)

Diving into data to uncover patterns, trends, and insights that lay the groundwork for further analysis or model building.

Analyzing the Largest Chemical Producers Worldwide

Problem: Limited access to global chemical producers' data for market research.
Solution: Built a web scraper to gather key data from multiple websites.
Impact: Provides valuable information for market research, helping companies stay informed about competitors.
Tools Used: BeautifulSoup, requests, NumPy, Pandas, Matplotlib, Seaborn

Exploratory Data Analysis of Gym Exercises

Problem: Lack of clear insights on the effectiveness of different gym exercises for specific fitness goals.
Solution: Conducted an exploratory data analysis (EDA) on a gym exercise dataset, using data visualization techniques to uncover patterns.
Impact: Identified the most effective exercises for various fitness goals, providing actionable insights for gym-goers and trainers.
Tools Used: NumPy, Pandas, Matplotlib, Seaborn

Movie Revenue Analysis

Problem: Difficulty in understanding the key factors that influence box-office revenue.
Solution: Retrieved and analyzed movie data using API calls, followed by exploratory data analysis (EDA) with visualization and statistical techniques to uncover patterns and trends.
Impact: Provides actionable insights for film production studios to optimize revenue forecasting and make data-driven decisions.
Tools Used: urllib.request, gzip, json, NumPy, Pandas, Matplotlib, Seaborn

CO₂ Emissions Over the Past 50 Years

Problem: Understanding global CO2 emissions trends to inform environmental policymaking.
Solution: Conducted exploratory data analysis (EDA) on CO2 emissions data to uncover key trends and patterns.
Impact: Provides insights that help policymakers identify trends and set data-driven targets for emissions reduction.
Tools Used: NumPy, Pandas, Matplotlib, Seaborn

🧠 Machine Learning

Building and fine-tuning models to solve problems like prediction, classification, and forecasting using real-world data.

Steel Industry Energy Consumption Forecasting

Problem: High energy costs in the steel industry driven by inefficient energy usage.
Solution: Developed a regression model to predict future energy consumption based on historical data.
Impact: Provides insights that help reduce energy costs and optimize usage, leading to improved efficiency.
Tools Used: NumPy, Pandas, Scikit-learn, Matplotlib, Seaborn, pickle

Predictive Maintenance for Industrial Machinery

Problem: Downtime from unexpected machinery failures leads to significant financial losses.
Solution: Developed a Random Forest classifier to predict machinery breakdowns based on historical data.
Impact: Helps reduce downtime and maintenance costs by enabling proactive repairs.
Tools Used: NumPy, Pandas, Scikit-learn, Matplotlib, Seaborn, imblearn, pickle

Industrial Steel Plates Defect Detection Using XGBoost

Problem: Manually detecting defects in steel products is time-consuming and prone to errors.
Solution: Built a classification model using XGBoost to automate defect detection, improving both speed and accuracy.
Impact: Increases accuracy in defect identification, reduces wastage, and boosts production efficiency.
Tools Used: NumPy, Pandas, XGBoost, Scikit-learn, Matplotlib, Seaborn, SMOTE, pickle

🧠 Deep Learning

Applying neural networks to tackle complex tasks like image classification, time-series forecasting, and other advanced challenges.

CNN for CIFAR-10 Image Classification

Problem: Need for accurate classification of images from the CIFAR-10 dataset, which includes 10 diverse categories.
Solution: Built a Convolutional Neural Network (CNN) using TensorFlow and Keras to classify images, optimizing for accuracy and model performance.
Impact: Achieved improved image classification accuracy, with potential applications in real-world image recognition tasks like object detection and autonomous systems.
Tools Used: TensorFlow, Keras, NumPy, Pandas, Matplotlib, Seaborn

Air Quality Forecasting

Problem: Need for accurate prediction of air quality to safeguard public health.
Solution: Developed an LSTM model for time-series forecasting to predict air quality based on historical data.
Impact: Empowers cities to anticipate air quality trends and take preventive measures, improving public health outcomes.
Tools Used: NumPy, Pandas, TensorFlow, Keras, Matplotlib, Seaborn, Statsmodels, Scikit-learn

📝 Natural Language Processing (NLP)

Applying text analysis techniques to uncover insights, identify patterns, and understand language data.

Uncovering Hidden Topics in News Snippets Using LDA"

Problem: Difficulty in uncovering underlying topics within large news datasets for content analysis.
Solution: Applied LDA (Latent Dirichlet Allocation) topic modeling to extract and categorize key topics from news articles.
Impact: Provides insights for content analysis, media strategy, and topic prediction, improving content personalization and decision-making.
Tools Used: Gensim, NLTK, Pandas, Matplotlib, Seaborn, pyLDAvis

Exploring Themes and Emotions in F. Scott Fitzgerald’s Works with NLP

Problem: Extracting sentiment and topics from large literary datasets, particularly from F. Scott Fitzgerald’s works.
Solution: Applied Cardiff RoBERTa for sentiment analysis and BERTopic for topic modeling to analyze sentiment and extract key themes from Fitzgerald’s texts.
Impact: Provides deeper insights into literary themes and sentiments, enhancing the understanding of F. Scott Fitzgerald’s work and its emotional depth.
Tools Used: Cardiff RoBERTa, BERTopic, NLTK, Transformers, SentenceTransformers, UMAP, HDBSCAN, Gensim, Matplotlib, Seaborn, WordCloud

⚙️ SQL Workflows

Designing and optimizing SQL queries to extract, clean, and prepare data for analysis or modeling.

📊 BI & Dashboards

Creating interactive visualizations and dashboards to transform complex data into clear, actionable insights.

🐍 Python Automation

Developing Python scripts to automate repetitive tasks and workflows, saving time and boosting efficiency

🛠️ What's Next?

I’m always exploring new ideas, building, experimenting, and sharing projects that mix code, design, and storytelling.

🎯 Let’s connect! If you're into data, ML, or just cool visualizations, let’s chat!

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

👩‍🔬 About Me

🧰 Tech Stack

🚀 Portfolio Highlights

🧐 Exploratory Data Analysis (EDA)

Analyzing the Largest Chemical Producers Worldwide

Exploratory Data Analysis of Gym Exercises

Movie Revenue Analysis

CO₂ Emissions Over the Past 50 Years

🧠 Machine Learning

Steel Industry Energy Consumption Forecasting

Predictive Maintenance for Industrial Machinery

Industrial Steel Plates Defect Detection Using XGBoost

🧠 Deep Learning

CNN for CIFAR-10 Image Classification

Air Quality Forecasting

📝 Natural Language Processing (NLP)

Uncovering Hidden Topics in News Snippets Using LDA"

Exploring Themes and Emotions in F. Scott Fitzgerald’s Works with NLP

⚙️ SQL Workflows

📊 BI & Dashboards

🐍 Python Automation

🛠️ What's Next?

About

Uh oh!

herrerovir/herrerovir

Folders and files

Latest commit

History

Repository files navigation

👩‍🔬 About Me

🧰 Tech Stack

🚀 Portfolio Highlights

🧐 Exploratory Data Analysis (EDA)

🧠 Machine Learning

🧠 Deep Learning

📝 Natural Language Processing (NLP)

⚙️ SQL Workflows

📊 BI & Dashboards

🐍 Python Automation

🛠️ What's Next?

About

Resources

Uh oh!

Stars

Watchers

Forks