Skip to content

herrerovir/herrerovir

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 

Repository files navigation

Top-Banner

👩‍🔬 About Me

I'm Virginia, a data scientist with a background in chemical engineering. Turning data into insights, models, and beautiful dashboards is my thing.

  • 🐍 Python enthusiast focused on solving real-world problems through data.
  • 🤖 Currently leveling up in machine learning and deep learning. Always exploring new techniques, always building something cool.
  • 🎨 Big on data visualization. Passionate about creating visually engaging content.
  • 📚 I document everything because good science and code should always be clear and reusable.

🧰 Tech Stack

Here are the tools and libraries I use regularly to explore data, build models, and tell compelling stories:

  • Languages & Frameworks
    Python · Scikit-learn · Keras · TensorFlow · PyTorch · NumPy · Pandas

  • Data Visualization
    Matplotlib · Seaborn · Power BI · Dashboards

  • Databases & Querying
    SQL · MySQL

  • Tools & Environment
    Jupyter Notebooks · VS Code · Git · GitHub, TensorBoard

🚀 Portfolio Highlights

Welcome to my digital lab! Below, you’ll find a collection of my data science projects where I dive into data, build models, and apply algorithms to solve real-world problems.

🧐 Exploratory Data Analysis (EDA)

Diving into data to uncover patterns, trends, and insights that lay the groundwork for further analysis or model building.

  • Problem: Limited access to global chemical producers' data for market research.
  • Solution: Built a web scraper to gather key data from multiple websites.
  • Impact: Provides valuable information for market research, helping companies stay informed about competitors.
  • Tools Used: BeautifulSoup, requests, NumPy, Pandas, Matplotlib, Seaborn

  • Problem: Lack of clear insights on the effectiveness of different gym exercises for specific fitness goals.
  • Solution: Conducted an exploratory data analysis (EDA) on a gym exercise dataset, using data visualization techniques to uncover patterns.
  • Impact: Identified the most effective exercises for various fitness goals, providing actionable insights for gym-goers and trainers.
  • Tools Used: NumPy, Pandas, Matplotlib, Seaborn

  • Problem: Difficulty in understanding the key factors that influence box-office revenue.
  • Solution: Retrieved and analyzed movie data using API calls, followed by exploratory data analysis (EDA) with visualization and statistical techniques to uncover patterns and trends.
  • Impact: Provides actionable insights for film production studios to optimize revenue forecasting and make data-driven decisions.
  • Tools Used: urllib.request, gzip, json, NumPy, Pandas, Matplotlib, Seaborn

  • Problem: Understanding global CO2 emissions trends to inform environmental policymaking.
  • Solution: Conducted exploratory data analysis (EDA) on CO2 emissions data to uncover key trends and patterns.
  • Impact: Provides insights that help policymakers identify trends and set data-driven targets for emissions reduction.
  • Tools Used: NumPy, Pandas, Matplotlib, Seaborn


🧠 Machine Learning

Building and fine-tuning models to solve problems like prediction, classification, and forecasting using real-world data.

  • Problem: High energy costs in the steel industry driven by inefficient energy usage.
  • Solution: Developed a regression model to predict future energy consumption based on historical data.
  • Impact: Provides insights that help reduce energy costs and optimize usage, leading to improved efficiency.
  • Tools Used: NumPy, Pandas, Scikit-learn, Matplotlib, Seaborn, pickle

  • Problem: Downtime from unexpected machinery failures leads to significant financial losses.
  • Solution: Developed a Random Forest classifier to predict machinery breakdowns based on historical data.
  • Impact: Helps reduce downtime and maintenance costs by enabling proactive repairs.
  • Tools Used: NumPy, Pandas, Scikit-learn, Matplotlib, Seaborn, imblearn, pickle

  • Problem: Manually detecting defects in steel products is time-consuming and prone to errors.
  • Solution: Built a classification model using XGBoost to automate defect detection, improving both speed and accuracy.
  • Impact: Increases accuracy in defect identification, reduces wastage, and boosts production efficiency.
  • Tools Used: NumPy, Pandas, XGBoost, Scikit-learn, Matplotlib, Seaborn, SMOTE, pickle


🧠 Deep Learning

Applying neural networks to tackle complex tasks like image classification, time-series forecasting, and other advanced challenges.

  • Problem: Need for accurate classification of images from the CIFAR-10 dataset, which includes 10 diverse categories.
  • Solution: Built a Convolutional Neural Network (CNN) using TensorFlow and Keras to classify images, optimizing for accuracy and model performance.
  • Impact: Achieved improved image classification accuracy, with potential applications in real-world image recognition tasks like object detection and autonomous systems.
  • Tools Used: TensorFlow, Keras, NumPy, Pandas, Matplotlib, Seaborn

  • Problem: Need for accurate prediction of air quality to safeguard public health.
  • Solution: Developed an LSTM model for time-series forecasting to predict air quality based on historical data.
  • Impact: Empowers cities to anticipate air quality trends and take preventive measures, improving public health outcomes.
  • Tools Used: NumPy, Pandas, TensorFlow, Keras, Matplotlib, Seaborn, Statsmodels, Scikit-learn


📝 Natural Language Processing (NLP)

Applying text analysis techniques to uncover insights, identify patterns, and understand language data.

  • Problem: Difficulty in uncovering underlying topics within large news datasets for content analysis.
  • Solution: Applied LDA (Latent Dirichlet Allocation) topic modeling to extract and categorize key topics from news articles.
  • Impact: Provides insights for content analysis, media strategy, and topic prediction, improving content personalization and decision-making.
  • Tools Used: Gensim, NLTK, Pandas, Matplotlib, Seaborn, pyLDAvis

  • Problem: Extracting sentiment and topics from large literary datasets, particularly from F. Scott Fitzgerald’s works.
  • Solution: Applied Cardiff RoBERTa for sentiment analysis and BERTopic for topic modeling to analyze sentiment and extract key themes from Fitzgerald’s texts.
  • Impact: Provides deeper insights into literary themes and sentiments, enhancing the understanding of F. Scott Fitzgerald’s work and its emotional depth.
  • Tools Used: Cardiff RoBERTa, BERTopic, NLTK, Transformers, SentenceTransformers, UMAP, HDBSCAN, Gensim, Matplotlib, Seaborn, WordCloud


⚙️ SQL Workflows

Designing and optimizing SQL queries to extract, clean, and prepare data for analysis or modeling.

 

 


📊 BI & Dashboards

Creating interactive visualizations and dashboards to transform complex data into clear, actionable insights.

 


🐍 Python Automation

Developing Python scripts to automate repetitive tasks and workflows, saving time and boosting efficiency

 

🛠️ What's Next?

I’m always exploring new ideas, building, experimenting, and sharing projects that mix code, design, and storytelling.

🎯 Let’s connect! If you're into data, ML, or just cool visualizations, let’s chat!

About

This is my personal GitHub repository.

Resources

Stars

Watchers

Forks