Arshad Kazi Arshad221b

Hi, I’m Arshad Kazi

I’m a Data Scientist with some industry experience. These are the projects/experiments I do in my free time (purely out of interest). For more professional (/industrial) info please refere to my linkedin profile.

I’m particularly interested in computer vision and multi-model systems that understand the world by looking at it. I love understanding these systems from the inside out. Instead of just using models, I rebuild them to learn how they actually work.

Most of my recent work explores self-supervision, pose estimation, image segmentation, and retrieval-augmented generation. I enjoy turning research into fully working systems—modular, testable, and often local-first (or on some small cloud GPU).

You can also read some of my techincal blogs here. And a personal & non-technical intro here.

About Me

Title: Data Scientist
Philosophy: I’m all for ML models that can help us unravel the mysteries of the universe (not just generate adorable cat images)
Interests: Self-supervised learning, multimodal systems, human-centric AI or anything that helps us understand how machines see the world (or what they actually percieve)

Stuff I'm Kind of Good At

Rebuilding ML models from scratch (ViTs, DINO, UNet, etc.)
Designing data pipelines that scale and debug well
Self-supervised learning and training loops
Fine-tuning large models with small data
Keeping code readable, minimal, and modular
Research work (finding papers for the solutions)

Stuff I'm Trying to Get Better At

Multi-modal learning (vision + text + audio)
Efficient training on low-resource hardware
Generative modeling (audio, diffusion, NeRFs)
Inference optimization (TensorRT, quantization)
Better unit testing and CI/CD practices for ML pipelines
Currently obsessed with geometric computer vision stuff and want to get better at it
Being less of a bozo and write better code than words ;)

Recent Computer Vision Projects

DINO from Scratch

A full reimplementation of DINO (Self-Distillation with No Labels) using PyTorch and Vision Transformers. The project focuses on contrast-free self-supervised learning using momentum encoders, centering, sharpening, and multi-view alignment.
Repo

2D to 3D Human Pose Uplift

A transformer-based system that lifts 2D pose keypoints to 3D, inspired by MotionBERT. Includes full training pipeline, temporal modeling, and 3D visualization of predictions.
Repo

YOLO-NAS for Custom Dataset

Fine-tuned YOLO-NAS (via SuperGradients) on an Indian street sign dataset. Includes support for custom annotations, training scripts, and inference setup.
Repo

3D U-Net for Image Segmentation

Custom-built 3D U-Net for volumetric CT/MRI segmentation. Includes patch-wise training, organ-level labeling, and volumetric visualization.
Repo

Natural Language Processing Projects

RAG on PDF

Retrieval-Augmented Generation (RAG) pipeline that reads PDFs and answers natural language questions. Built using LangChain, FAISS, and LLMs with a focus on local-first privacy and fast contextual retrieval.
Repo

Named Entity Recognition (NER)

NER system built using Huggingface’s RoBERTa model. Includes token classification, training loop, data preprocessing, and clean inference pipeline.
Repo

Open to

Research collaborations in CV, SSL, pose estimation, segmentation
Contract or freelance roles building full-stack ML systems
Full-time roles in data science or ML engineering with depth and autonomy

Open for opportunities and learn.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly