A comprehensive data science and analytics platform for the Creole Creamery Hall of Fame challenge. This monorepo contains all components of the project: web application, ETL pipeline, machine learning models, and development tools.
the-data-tchoup/
├── app/ # Next.js web application
├── ETL/ # AWS Lambda scraper & infrastructure
├── ml/ # Machine learning models
├── dev/ # Development tools & database
- Node.js 22+
- Python 3.11+
- Docker
- AWS CLI (for ETL deployment)
- pnpm (for app development)
-
Clone the repository with submodules:
git clone --recursive <repository-url> cd the-data-tchoup
-
Set up the web application:
cd app pnpm install cp env.example .env.local # Configure your database URL pnpm dev
-
Set up the ETL pipeline:
cd ETL poetry install cp terraform/terraform.tfvars.example terraform/terraform.tfvars # Configure your AWS and database credentials
-
Set up the ML models:
cd ml poetry install # Configure your database connection
- Next.js 15 web application with TypeScript
- Drizzle ORM for database operations
- TailwindCSS + shadcn/ui for styling
- Recharts for data visualizations
- TanStack Query for state management
Features:
- Interactive data dashboard
- Real-time analytics
- Beautiful visualizations
- Mobile-responsive design
- AWS Lambda function for automated data collection
- OpenAI GPT-4 powered web scraping
- Terraform for infrastructure as code
- Docker containerization
- AWS EventBridge for scheduling
Features:
- Daily automated scraping
- LLM-powered data extraction
- Infrastructure automation
- Monitoring and logging
- scikit-learn for predictive modeling
- pandas/numpy for data processing
- PostgreSQL integration
- Feature engineering from sparse data
Features:
- Competition timing predictions
- Pattern analysis
- Model evaluation and validation
- Development tools and utilities
- Database migrations and setup
- Local development environment
cd app
pnpm dev
# Visit http://localhost:3000cd ETL
poetry shell
python tests/test_scraper.pycd ml
poetry shell
python src/timing_model.pycd app
vercel --prodcd ETL
./CICD/deploy.sh- ETL Pipeline scrapes data daily from Creole Creamery website
- ML Models analyze patterns and make predictions
- Web App visualizes data and provides insights
- Database stores all historical competition data
- Create feature branch in main repo
- Make changes in relevant submodule
- Commit and push submodule changes
- Update submodule reference in main repo
- Create pull request
# Update all submodules to latest
git submodule update --remote
# Update specific submodule
git submodule update --remote appDATABASE_URL=postgresql://username:password@hostname:port/database
NEXT_PUBLIC_APP_URL=http://localhost:3000NEON_DATABASE_URL=postgresql://username:password@hostname:port/database
OPENAI_API_KEY=sk-your-openai-api-key-here
AWS_REGION=us-east-1NEON_DATABASE_URL=postgresql://username:password@hostname:port/database- Fork the repository
- Create a feature branch
- Make your changes
- Test all components
- Submit a pull request
This project is licensed under the MIT License.
- Creole Creamery for the inspiration
- Next.js team for the amazing framework
- OpenAI for LLM capabilities
- AWS for cloud infrastructure