Skip to content
View Arshad221b's full-sized avatar
👁️
Do machines 'understand' the world just by looking at it?
👁️
Do machines 'understand' the world just by looking at it?

Block or report Arshad221b

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Arshad221b/README.md

Hi, I’m Arshad Kazi

I’m a Data Scientist with some industry experience. These are the projects/experiments I do in my free time (purely out of interest). For more professional (/industrial) info please refere to my linkedin profile.

I’m particularly interested in computer vision and multi-model systems that understand the world by looking at it. I love understanding these systems from the inside out. Instead of just using models, I rebuild them to learn how they actually work.

Most of my recent work explores self-supervision, pose estimation, image segmentation, and retrieval-augmented generation. I enjoy turning research into fully working systems—modular, testable, and often local-first (or on some small cloud GPU).


About Me

  • Title: Data Scientist
  • Philosophy: I’m all for ML models that can help us unravel the mysteries of the universe (not just generate adorable cat images)
  • Interests: Self-supervised learning, multimodal systems, human-centric AI or anything that helps us understand how machines see the world (or what they actually percieve)

Stuff I'm Kind of Good At

  • Rebuilding ML models from scratch (ViTs, DINO, UNet, etc.)
  • Designing data pipelines that scale and debug well
  • Self-supervised learning and training loops
  • Fine-tuning large models with small data
  • Keeping code readable, minimal, and modular
  • Research work (finding papers for the solutions)

Stuff I'm Trying to Get Better At

  • Multi-modal learning (vision + text + audio)
  • Efficient training on low-resource hardware
  • Generative modeling (audio, diffusion, NeRFs)
  • Inference optimization (TensorRT, quantization)
  • Better unit testing and CI/CD practices for ML pipelines
  • Currently obsessed with geometric computer vision stuff and want to get better at it
  • Being less of a bozo and write better code than words ;)

Github Stats

Arshad221b's github stats

Recent Computer Vision Projects

DINO from Scratch

A full reimplementation of DINO (Self-Distillation with No Labels) using PyTorch and Vision Transformers. The project focuses on contrast-free self-supervised learning using momentum encoders, centering, sharpening, and multi-view alignment.
DINO Architecture Repo

2D to 3D Human Pose Uplift

A transformer-based system that lifts 2D pose keypoints to 3D, inspired by MotionBERT. Includes full training pipeline, temporal modeling, and 3D visualization of predictions.
2d to 3d pose Repo

YOLO-NAS for Custom Dataset

Fine-tuned YOLO-NAS (via SuperGradients) on an Indian street sign dataset. Includes support for custom annotations, training scripts, and inference setup.
yolo nas Repo

3D U-Net for Image Segmentation

Custom-built 3D U-Net for volumetric CT/MRI segmentation. Includes patch-wise training, organ-level labeling, and volumetric visualization.
Repo


Natural Language Processing Projects

RAG on PDF

Retrieval-Augmented Generation (RAG) pipeline that reads PDFs and answers natural language questions. Built using LangChain, FAISS, and LLMs with a focus on local-first privacy and fast contextual retrieval.
Repo

Named Entity Recognition (NER)

NER system built using Huggingface’s RoBERTa model. Includes token classification, training loop, data preprocessing, and clean inference pipeline.
Repo


Open to

  • Research collaborations in CV, SSL, pose estimation, segmentation
  • Contract or freelance roles building full-stack ML systems
  • Full-time roles in data science or ML engineering with depth and autonomy

Open for opportunities and learn.

Pinned Loading

  1. DINO_from_scratch DINO_from_scratch Public

    PyTorch implementation of DINO (Self-Distillation with No Labels) from scratch.

    Python 11

  2. NeRF-from-scratch NeRF-from-scratch Public

    A PyTorch implementation of Neural Radiance Fields (NeRF) built from scratch.

    Python 2

  3. 3D-UNet-Image-Segmentation 3D-UNet-Image-Segmentation Public

    3D image segmentation using UNet: for MRI/CT scan image data

    Jupyter Notebook 6 1

  4. RAG-on-PDF RAG-on-PDF Public

    Retrieval-Augmented Generation (RAG) over a Large Language Model (LLM) For PDF data extraction

    Jupyter Notebook 24 2

  5. 2d_to_3d_human_pose_uplift 2d_to_3d_human_pose_uplift Public

    Transformer based 2D to 3D Human Pose Estimation

    Python 3

  6. Sign-Language-Recognition Sign-Language-Recognition Public

    Indian Sign language Recognition using OpenCV

    Jupyter Notebook 207 91