Skip to content

EfekanSalman/Crocodylus-Gauge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🐊 Crocodylus-Gauge: Crocodile Weight Prediction Pipeline

Crocodylus-Gauge is a modular Machine Learning (ML) project designed to predict the weight of crocodiles (in kilograms) based on their observed length (in meters). The project leverages the strong biological relationship between weight and the cube of length (L³) to achieve high-accuracy predictions.

This repository follows MLOps best practices with a production-ready, modular, and testable ML pipeline.


🚀 Getting Started

Prerequisites

  • Python 3.8+
  • pip
  • make (optional, but recommended)

Installation

Clone the repository and install the dependencies:

git clone https://github.com/EfekanSalman/Crocodylus-Gauge.git
cd Crocodylus-Gauge

# Create and activate a virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate   # On Linux/MacOS
.venv\Scripts\activate      # On Windows

# Install required Python packages
pip install -r requirements.txt

Data Preparation (Crucial Step)

The project expects the raw data file to be located in a specific directory structure:

mkdir -p data/raw
mkdir models

Move your raw data file (crocodile_dataset.csv) into the directory:

cp /path/to/your/crocodile_dataset.csv data/raw/

⚙️ Project Usage

1. Model Training & Saving

Run the pipeline script to process data, engineer features, train the model, and save it:

python src/crocodilus_gauge/model_pipeline.py
  • Console output: model performance metrics (RMSE, R² Score)
  • Saved model: models/crocodile_weight_predictor.joblib

2. Making Predictions

Use the prediction script with the trained model:

python predict.py
  • Runs predictions for sample crocodiles (e.g., 5.5m Crocodylus)
  • Displays predicted weights in kilograms.

📁 Project Structure

Crocodylus-Gauge/
├── data/
│   └── raw/
│       └── crocodile_dataset.csv        # Raw dataset
├── models/
│   └── crocodile_weight_predictor.joblib # Saved ML pipeline
├── src/
│   └── crocodilus_gauge/
│       ├── __init__.py
│       ├── config.py                    # Constants (MODEL_PATH, column names, etc.)
│       ├── data_processing.py           # Data loading + Train/Test split
│       ├── feature_engineering.py       # Custom sklearn transformers (cleaning, L³)
│       └── model_pipeline.py            # End-to-end ML pipeline training
├── tests/
│   └── test_feature_engineering.py      # Unit tests for transformers
├── predict.py                           # Script: load model + predict
└── requirements.txt                     # Project dependencies

✅ Testing

Run unit tests to ensure reliability:

python -m unittest discover tests

All tests should return OK.


🐳 Deployment (WIP)

The project is designed to be containerized with Docker for consistent deployment across environments. See the Dockerfile for details.


About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published