building a simple VLM. Implementing LlaMA-SmolLM2 from scratch + SigLip2 Vision Model. KV-Caching is supported and implemented from scratch as well
-
Updated
May 12, 2025 - Jupyter Notebook
building a simple VLM. Implementing LlaMA-SmolLM2 from scratch + SigLip2 Vision Model. KV-Caching is supported and implemented from scratch as well
A Python script to analyze images generated using a LoRA (Low-Rank Adaptation) model applied at various strength levels. This tool helps determine an optimal strength for a given LoRA by evaluating image quality and similarity to control images.
building a simple VLM. Implementing LlaMA-SmolLM2 from scratch + SigLip2 Vision Model. KV-Caching is supported and implemented from scratch as well
Fine-tuning DINO object detection model on a COCO-annotated pedestrian dataset from IIT Delhi. Includes data prep, training, evaluation, and visualization scripts.
Add a description, image, and links to the finetuning-vision-models topic page so that developers can more easily learn about it.
To associate your repository with the finetuning-vision-models topic, visit your repo's landing page and select "manage topics."