Analytics Hub is an intelligent data analytics platform that automatically preprocesses datasets and identifies the best machine learning model for making predictions. Built with modern web technologies and deployed on AWS infrastructure, it provides a seamless experience for data scientists and analysts.
- Automated Data Preprocessing: Leverages EvalML's powerful preprocessing capabilities to clean and prepare your data
- Intelligent Model Selection: Automatically evaluates multiple machine learning models and recommends the best performer
- Interactive Web Interface: Modern, responsive frontend built with Next.js for intuitive data exploration
- Scalable Cloud Infrastructure: Deployed on AWS EC2 with Docker containerization for reliable performance
- Notebook Integration: Uses Papermill for parameterized notebook execution and reporting
- Next.js: React-based framework for server-side rendering and optimal performance
- JavaScript: Modern ES6+ for dynamic user interactions
- Python: Core machine learning and data processing logic
- EvalML: AutoML library for automated model selection and evaluation
- Papermill: Notebook parameterization and execution engine
- AWS EC2: Cloud computing platform for scalable deployment
- Docker: Containerization for consistent environments across development and production
- Node.js (v16 or higher)
- Python 3.8+
- Docker
- AWS CLI (for deployment)
-
Clone the repository
git clone <repository-url> cd analytics-hub
-
Install frontend dependencies
npm install
-
Set up Python environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt
-
Start the development servers
# Frontend (Next.js) npm run dev # Backend (Python API) python app.py
-
Access the application
- Frontend: http://localhost:3000
- API: http://localhost:8000
-
Build the Docker image
docker build -t analytics-hub .
-
Run the container
docker run -p 3000:3000 -p 8000:8000 analytics-hub
-
Launch EC2 instance
- Use Amazon Linux 2 or Ubuntu AMI
- Configure security groups for ports 3000, 8000, and 22
-
Deploy using Docker
# SSH into your EC2 instance ssh -i your-key.pem ec2-user@your-instance-ip # Install Docker sudo yum update -y sudo yum install docker -y sudo service docker start # Pull and run your containerized application docker pull your-registry/analytics-hub docker run -d -p 3000:3000 -p 8000:8000 analytics-hub
- Upload Dataset: Use the web interface to upload your CSV or structured data file
- Data Preprocessing: The system automatically preprocesses your data using EvalML's built-in capabilities
- Model Training: Multiple ML models are trained and evaluated automatically
- Results: View model performance metrics and select the best model for your use case
- Predictions: Make predictions on new data using the selected model
Create a .env.local
file in the root directory:
NEXT_PUBLIC_API_URL=http://localhost:8000
AWS_REGION=us-east-1
DOCKER_REGISTRY=your-registry-url
The EvalML pipeline can be customized in config/evalml_config.py
:
EVALML_CONFIG = {
"problem_type": "auto",
"max_iterations": 10,
"patience": 5,
"tolerance": 0.01
}
- Fork the repository
- Create a feature branch (
git checkout -b feature/new-feature
) - Commit your changes (
git commit -am 'Add new feature'
) - Push to the branch (
git push origin feature/new-feature
) - Create a Pull Request
- Caching: Redis integration for model caching
- Load Balancing: AWS Application Load Balancer support
- Auto Scaling: EC2 Auto Scaling Group configuration
- Database: PostgreSQL for persistent storage
This project is licensed under the MIT License - see the LICENSE file for details.