Crocodylus-Gauge is a modular Machine Learning (ML) project designed to predict the weight of crocodiles (in kilograms) based on their observed length (in meters). The project leverages the strong biological relationship between weight and the cube of length (L³) to achieve high-accuracy predictions.
This repository follows MLOps best practices with a production-ready, modular, and testable ML pipeline.
- Python 3.8+
- pip
- make (optional, but recommended)
Clone the repository and install the dependencies:
git clone https://github.com/EfekanSalman/Crocodylus-Gauge.git
cd Crocodylus-Gauge
# Create and activate a virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate # On Linux/MacOS
.venv\Scripts\activate # On Windows
# Install required Python packages
pip install -r requirements.txt
The project expects the raw data file to be located in a specific directory structure:
mkdir -p data/raw
mkdir models
Move your raw data file (crocodile_dataset.csv
) into the directory:
cp /path/to/your/crocodile_dataset.csv data/raw/
Run the pipeline script to process data, engineer features, train the model, and save it:
python src/crocodilus_gauge/model_pipeline.py
- Console output: model performance metrics (RMSE, R² Score)
- Saved model:
models/crocodile_weight_predictor.joblib
Use the prediction script with the trained model:
python predict.py
- Runs predictions for sample crocodiles (e.g., 5.5m Crocodylus)
- Displays predicted weights in kilograms.
Crocodylus-Gauge/
├── data/
│ └── raw/
│ └── crocodile_dataset.csv # Raw dataset
├── models/
│ └── crocodile_weight_predictor.joblib # Saved ML pipeline
├── src/
│ └── crocodilus_gauge/
│ ├── __init__.py
│ ├── config.py # Constants (MODEL_PATH, column names, etc.)
│ ├── data_processing.py # Data loading + Train/Test split
│ ├── feature_engineering.py # Custom sklearn transformers (cleaning, L³)
│ └── model_pipeline.py # End-to-end ML pipeline training
├── tests/
│ └── test_feature_engineering.py # Unit tests for transformers
├── predict.py # Script: load model + predict
└── requirements.txt # Project dependencies
Run unit tests to ensure reliability:
python -m unittest discover tests
All tests should return OK.
The project is designed to be containerized with Docker for consistent deployment across environments. See the Dockerfile
for details.