- Data Ingestion # Download, extract, and prepare raw data
- Create the base model # Download the VGG16 with Conv. layers only, add the customized dense layers to it, and save both models
- Train the base model # Train the model on processed data
- Evaluate the base model with MLflow # Log metrics, params, and model using MLflow
- Create the prediction pipeline # Build serving logic for inference (e.g., API/UI integration)
- Develop the App
- Setup the AWS EC2, ECR, IAM and Jenkins
- Setup the GitHub Actions secrets
- Trigger the Pipeline
- Update the config.yaml. # changeble vars and urls
- Update the params.yaml & read it. # Specify hyperparameters and tunable settings
- Update the entity & read it. # Create dataclasses to structure config return-type of functions
- Update the configuration manager in src config. # Parse YAMLs, instantiate entity configs
- Update the components. # Write the logic for this step (e.g., download, train, evaluate)
- Update the pipeline. # Sequence component calls using the pipeline class
- Update the main.py. # Create entry/end point to trigger the pipeline with logging
- Update the dvc.yaml. # Track dependencies, outputs, and automate pipeline with DVC
conda create -n chest python=3.8 -y
conda activate chest
pip install -r requirements.txt
python app.py
git add .
git commit -m "Updated"
git push origin main
In your setup (image classifier, training via EC2 and Jenkins, Docker + ECR), DVC helps by:
🛠️ Structuring your ML workflow into stages (e.g., data prep → training → evaluation)
📦 Storing large files (datasets, models) outside Git (in S3, GDrive, etc.) while still tracking versions
📈 Making experiments reproducible — anyone can re-run your full pipeline with dvc repro
🔁 Helping Jenkins or other automation tools track whether files or stages changed
🔍 Tracking hyperparameters and model performance — using params.yaml and metrics.yaml for transparent experimentation and tuning
dvc init # Initialize DVC in your repo
dvc repro # Re-run pipeline stages as needed
dvc dag # Visualize pipeline dependencies graphically
- Create EC2-1 machine for Jenkins (Ubuntu 22, RAM >= 4GB, Disk >= 32GB) + set Elastic IP + Update/upgrade + AWS access key configuration
- Create IAM user (Add AdministratorAccess permission)
- Create ECR Repository for the App
- Install Jenkins and Docker on EC2-1
- Install SSH Agent plugin on Jenkins
- Setup the Credincials (Here, 5 Credintials as are included in the Jenkinsfile)
- Create the Pipeline in Jenkins and link it to your Github repo (plus the Jenkinsfile path, e.g., .jenkins/Jenkinsfile)
- Create EC2-2 machine for the App (Ubuntu 22, t2.large, RAM >= 8GB, Disk >= 32 GB ) + Update/upgrade + AWS access key configuration
- Install Docker + setup
- Add required Secrets in Github
- Trigger the Pipeline