title | emoji | colorFrom | colorTo | sdk | sdk_version | app_file | pinned |
---|---|---|---|---|---|---|---|
RideCastAI |
🚖 |
purple |
orange |
streamlit |
1.33.0 |
streamlit_app/app.py |
false |
🚀 RideCastAI is a smart, real-time fare and ETA prediction engine built using real NYC taxi data. It blends time-series forecasting, spatial demand modeling, and uncertainty-aware regression to simulate real-world ride-hailing behavior — deployable and production-ready.
"What will this ride cost me and how long will it take?"
This app uses real NYC taxi data and engineered ride context features like:
- Hour of day
- Day of week
- Trip distance
- Rush-hour indicators
to simulate dynamic pricing and duration estimation logic, similar to Uber/Lyft backend systems.
✅ ETA prediction using XGBoost
✅ Fare estimation with uncertainty bands (Quantile Regression)
✅ NYC demand heatmap
✅ Streamlit UI with live trip simulation
✅ Model Prediction Latency printed in UI for performance observability
✅ Model performance metrics built-in
✅ Trained locally & deployed on Hugging Face Spaces
✅ Modular codebase for real-world use
┌────────────────────────────┐
│ User Input |
│ (Time, Day, Distance) │
└────────────┬───────────────┘
│
▼
┌────────────────────────────┐
│ Feature Engineering │
│ (rush hour, sin/cos, etc.) │
└────────────┬───────────────┘
│
▼
┌────────────────────────────────────────┐
│ ML Models (XGBoost, GBR) │
│ ETA Model + Fare Models (Q10/Q50/Q90) │
└────────────┬───────────────┬────────────┘
│ │
ETA Prediction Fare Prediction
│ │
▼ ▼
┌────────────────┐ ┌────────────────────┐
│ ETA Output │ │ Fare Estimate (+/-)│
└────────────────┘ └────────────────────┘
│ │
└──────┬────────┘
▼
┌───────────────────────────┐
│ Streamlit UI (App) │
│ Predict Button triggers ML │
│ Heatmap visual via Plotly │
└────────────┬───────────────┘
│
▼
┌──────────────────────────────────────┐
│ 🔥 NYC Demand Heatmap |
│ Real-time Plotly Density Map │
└──────────────────────────────────────┘
hour_sin
,hour_cos
: Circular encoding of timeis_rush_hour
: Encodes commute stressis_weekend
: Affects ride supply/demandQuantile regression
: Predicts fare range not just point estimate
✅ Makes the model explainable and closer to real-world behavior
⚡ Prediction latency (in ms) is logged and displayed in the app UI on each request.
Model | Metric | Value |
---|---|---|
ETA Model | MAE | ~3.9 mins |
Fare Model | MAE (Q50) | ~2.3 USD |
Metrics auto-logged and displayed in app sidebar.
RideCastAI isn't just a demo — it's a modular, production-grade ML platform designed to mirror real-world ride-hailing pricing engines. Here's how it works:
Raw trip data from NYC Yellow Taxi dataset is processed to extract meaningful features:
- Distance (in miles)
- Pickup hour (0-23)
- Day of week (0-6)
- Is weekend / Is rush hour
- Time cyclic encodings (sin/cos of hour)
✅ Why it matters: Features mimic how actual ride-hailing engines reason about traffic, demand, and surge pricing.
Two types of models are trained:
- ETA Model:
XGBoost Regressor
- Fare Models: 3 separate
GradientBoostingRegressor
models for 10th, 50th, and 90th percentile fare estimates.
This helps simulate a fare range with uncertainty — like Uber's real-time quote system.
✅ Why it matters: Demonstrates quantile regression and uncertainty estimation for real-world predictions.
The frontend UI lets users input:
- Pickup hour
- Day of week
- Trip distance
And returns:
- ETA prediction
- Fare prediction (Low / Median / High)
- Prediction latency in ms
- Dynamic demand heatmap
✅ Why it matters: Showcases full-stack ML skill from model to UI deployment with latency logging.
- User selects trip time and distance via Streamlit UI
- App predicts:
- ⏱️ ETA (in minutes)
- 💸 Fare range (low–mid–high estimate)
- 🔥 Demand heatmap of NYC shown for visual context
- 📊 Sidebar displays live model performance (e.g., MAE)
👉 Try it on Hugging Face
git clone https://github.com/rajesh1804/RideCastAI.git
cd RideCastAI
Use Python 3.10 for full compatibility.
pip install -r requirements.txt
python src/download_dataset.py
python src/preprocess.py
(venv) RideCastAI>python src\train_eta_model.py
📥 Loading processed data...
🔀 Splitting into train/test...
🚀 Starting ETA model training...
✅ Training completed in 3.75 seconds
📈 Evaluating model on test set...
📊 ETA MAE: 3.11 minutes
💾 ETA model saved to src/models/eta_model.pkl
(venv) RideCastAI>python src\train_fare_model.py
🔄 Loading and preprocessing data...
✅ Data loaded and split. Time taken: 5.58s
🚀 Training quantile model for q=0.1...
✅ Done. MAE @ quantile 0.1: 2.72 | Time taken: 441.53s
💾 Model saved to src/models/fare_model_q10.pkl
🚀 Training quantile model for q=0.5...
✅ Done. MAE @ quantile 0.5: 1.85 | Time taken: 483.59s
💾 Model saved to src/models/fare_model_q50.pkl
🚀 Training quantile model for q=0.9...
✅ Done. MAE @ quantile 0.9: 3.44 | Time taken: 442.07s
💾 Model saved to src/models/fare_model_q90.pkl
🎉 All quantile models trained and saved successfully.
streamlit run streamlit_app/app.py
RideCastAI/
├── assets/
│ └── ridecastai-architecture.png
├── data/
│ └── yellow_tripdata_2023-01.parquet
├── src/
│ ├── preprocess.py
│ ├── features.py
│ ├── train_eta_model.py
│ ├── train_fare_model.py
│ └── models/
│ └── *.pkl
├── streamlit_app/
│ ├── app.py
│ └── utils.py
├── requirements.txt
└── README.md
- Frontend: Streamlit for interactive UI
- ML Models: XGBoost, GradientBoostingRegressor
- Metrics: Scikit-learn (MAE, metrics logging)
- Feature Engg.: NumPy, Pandas
- Deployment: Hugging Face Spaces
The NYC demand heatmap is currently simulated using pseudo-random normalized ride volumes per zone. This can be replaced with real pickup zone distributions using TLC GeoJSONs.
MIT — see LICENSE for details.
Built by Rajesh Marudhachalam
🏆 Uber, Lyft, Bolt, Grab
📧 Say hi at Gmail
🔗 LinkedIn