NLP-Text-Classification-Pipeline

This is a modular, production-ready machine learning project for detecting hate speech in text data using deep learning. It follows a clean, stage-wise architecture with components for data ingestion, validation, transformation, model training, evaluation, and deployment. The pipeline includes GCP integration for cloud storage, model registry, and CI/CD with CircleCI for automated deployment on a GCP VM. This project ensures scalable, reproducible ML workflows suitable for real-world NLP applications.

🎥 Demo on LinkedIn

A brief walkthrough of this project is available on LinkedIn post.

Pipeline Stages

Data Ingestion
Data Validation
Data Trnsformation
Model Training
Model Evaluation
Model Pusher
Model Prediction
Model Deployment

Note: You need to install and configure the gcloud sdk in your system to featch the data from gcloud storage bucket

Workflows (For each stage in the pipeline)

constants
config_entity
artifact_entity
components
pipeline
main.py

Model Deployment setup

Setup the CircleCI
Activate the 'Self-Hosted Runners' by confirming the terms
Create a new project in CircleCI
Link the project to your GitHub repository
Configure VM instance
Configure GCR in GCP
Write the 'config.yml' file
setup the environment variables

a. Allow CircleCI to Clone Private GitHub Repository

Generate a dedicated SSH key pair for your project (on your local machine):
```
ssh-keygen -t ed25519 -f ~/.ssh/project_key -C "your_email@example.com"
```
- This generates two files:
  - ~/.ssh/project_key (private key)
  - ~/.ssh/project_key.pub (public key)
Add the public key to GitHub as a Deploy Key:
- Go to your GitHub repo → Settings → Deploy Keys
- Click "Add deploy key"
  - Title: e.g., CircleCI Access
  - Key: paste the contents of project_key.pub
  - Enable "Allow write access" if needed (e.g., if CircleCI pushes code/tags)
Add the private key to CircleCI:
- Go to CircleCI Project Settings → SSH Keys → "Add SSH Key"
- Choose: "Other"
  - Paste the private key (project_key)
  - Hostname: github.com

b. GCP VM Setup for End-to-End Automation

Create a VM instance in GCP with a Linux OS (e.g., Ubuntu 22.04) and allow HTTP/HTTPS traffic.
Allow firewall rules to enable access port8080.
Enable Artifact Registry API in your GCP project to allow Docker image pulling.
Install Docker on the VM instance.
Authenticate Docker with Artifact Registry:

gcloud auth configure-docker us-central1-docker.pkg.dev

Give the VM access to pull from Artifact Registry (via service account permissions or using gcloud auth login).
Install Google Cloud SDK on the VM (if not using a service account)

Add the docker user to admin group:

sudo usermod -aG docker $USER
newgrp docker

Restart the VM to apply the changes.

c. GCP VM SSH Setup & CircleCI Deployment

Create an SSH key pair on your local machine (if you don’t already have one):

$ ssh-keygen -t rsa -f ~/.ssh/gcp-key -C youremail@gmail.com

Add the public key to your GCP VM:
- Go to the VM instance details in GCP.
- Click "Edit" → scroll to SSH Keys.
- Click "Add Key" → paste the public key from ~/.ssh/gcp-key.pub.
Add the private key to CircleCI:
- Go to your project in CircleCI → Project Settings → SSH Keys → Add SSH Key.
- Paste your private key (~/.ssh/gcp-key) there.
- Set the hostname as 35.xxx.xxx.xxx (your VM's external IP) or just *.
Add the GCP required environment variables in CircleCI environment variables:
- Go to your project in CircleCI → Environment Variables → Add Environment Variable. e.g., for GCP VM IP:
- Name: GCP_VM_IP, Value: 35.xxx.xxx.xxx
- Name: SSH_PRIVATE_KEY, Value: your private key (~/.ssh/gcp-key in base64) --> Follow the config.yml file for this important step.

Running the programme locally

conda create -n nlp python=3.10 -y

conda activate nlp

pip install -r requirements.txt

python main.py

Bonus point!

Kubernetes Engine and Docker Configuration Commands/Steps

1  clear 
2  sudo apt-get update
3  git clone https://github.com/razyousuf/NLP-Text-Classification-Pipeline
4  cd NLP-Text-Classification-Pipeline/
5  ls 
6  cat Dockerfile 
7  export PROJECT_ID=crypto-snow-432611-i2 # YOUR_PROJECT_ID !
8  docker build -t gcr.io/${PROJECT_ID}/hatespeech-app:v1 .  
9  docker images  
10  gcloud auth configure-docker gcr.io  
11  docker push gcr.io/${PROJECT_ID}/hatespeech-app:v1  
12  gcloud config set compute/zone us-central1  
13  kubectl get pods  
14  kubectl expose deployment hatespeech-app --type=loadbalancer --port 80 --target-port 8080  
15  gcloud container clusters get-credentials hatespeech-cluster --region us-central1  
16  kubectl get pods  
17  kubectl create deployment hatespeech-app --image=gcr.io/${PROJECT_ID}/hatespeech-app:v1  
18  kubectl get pods  
19  kubectl expose deployment hatespeech-app --type=LoadBalancer --port=80 --target-port=8080   
20  kubectl get services   
21  kubectl cluster-info  
22  kubectl get nodes

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.circleci		.circleci
artifacts/PredictionModel		artifacts/PredictionModel
experiment		experiment
flowchart		flowchart
hate		hate
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py
template.py		template.py
threshold.txt		threshold.txt
tokenizer.pickle		tokenizer.pickle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NLP-Text-Classification-Pipeline

🎥 Demo on LinkedIn

Pipeline Stages

Workflows (For each stage in the pipeline)

Model Deployment setup

a. Allow CircleCI to Clone Private GitHub Repository

b. GCP VM Setup for End-to-End Automation

c. GCP VM SSH Setup & CircleCI Deployment

Running the programme locally

Bonus point!

Kubernetes Engine and Docker Configuration Commands/Steps

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

razyousuf/NLP-Text-Classification-Pipeline

Folders and files

Latest commit

History

Repository files navigation

NLP-Text-Classification-Pipeline

🎥 Demo on LinkedIn

Pipeline Stages

Workflows (For each stage in the pipeline)

Model Deployment setup

a. Allow CircleCI to Clone Private GitHub Repository

b. GCP VM Setup for End-to-End Automation

c. GCP VM SSH Setup & CircleCI Deployment

Running the programme locally

Bonus point!

Kubernetes Engine and Docker Configuration Commands/Steps

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages