Skip to content

rebeccacamejo/ml-foundations

Repository files navigation

Machine Learning Foundations & Applications

Welcome to the ML Foundations repository! This project demonstrates my progressive journey from the most basic machine‑learning concepts through increasingly sophisticated techniques. Each numbered folder contains one or more Jupyter notebooks that implement a concept, alongside a concise README.md explaining the theory and its relevance to real‑world business scenarios.

The goal is to store my learnings, as well as illustrate how data‑driven insights translate into actionable value for my favorite industries.

The project is organised as follows:

Folder Description
01_linear_regression Fundamental regression analysis using the diabetes dataset.
02_classification_basics Logistic regression and decision trees for classification.
03_neural_networks A multi‑layer perceptron for digit recognition and a hands‑on convolution demonstration.
04_unsupervised_learning K‑means clustering, PCA, and t‑SNE visualisation techniques.
05_nlp_basics Bag‑of‑words and TF‑IDF models for sentiment classification on a custom dataset.
06_interpretability_basics Feature importance and word‑weight visualisation for model interpretability.

Each notebook is self‑contained: it loads its own data, trains a model, produces evaluation metrics and visualizations, and concludes with a business interpretation. I also included current applications today of these concepts in the wild, as well as future case studies I find to be interesting and valuable.

The requirements.txt files list the minimal Python packages needed to run the notebooks. When exploring this repository, run the notebooks in order: they build on the concepts introduced in previous sections.

Communicating Impact

One of the core learnings from my early career is the importance of how to communicate impact to stakeholders.

To connect your technical work to meaningful business outcomes, frame your analysis around three questions:

  1. What decision will this model inform? State the real‑world choice you’re trying to improve (e.g., “Which customers should receive proactive support?” or “When should we schedule equipment maintenance?”).
  2. How does the model change behaviour? Describe how predictions translate into actions, whether it is routing disputes faster, pricing more accurately, reducing idle time, etc.
  3. What is the measurable impact? Quantify key metrics (KPI lift, cost savings, conversion improvements). Consider building a simple value simulation:
    • For classification models, assign costs/benefits to true positives, false positives, false negatives, and true negatives to compute expected value.
    • For regression models, convert error metrics into tangible impact (e.g., how a 10‑day reduction in forecast error translates to working‑capital savings).
    • For segmentation/clustering, run A/B tests to measure lift in conversion or retention from targeting the “right” cluster.

Tie these metrics back to your industry (these are just some that I have experience in and undertand their core issues):

  • FinTech: Reduced fraud losses, improved approval rates, lower capital requirements.
  • SaaS: Lower churn, higher upsell conversion, improved net revenue retention.
  • Logistics: Reduced fuel costs, shorter delivery windows, increased on‑time delivery rates.
  • Construction/field services: Fewer repeat calls, better margin control, faster project completion.

Quantified impact and a clear narrative will help non‑technical stakeholders appreciate the value of your machine‑learning projects.

Note from Me

I hope you find this repository both educational and inspiring. Use it freely to prepare for interviews, refresh your understanding of core ML techniques or as a starting point for more advanced research projects.

Releases

No releases published

Packages

No packages published