cs2420

This repository contains code and notebooks related to a project involving fine-tuning and testing CLIP and Stable Diffusion models for forensic sketch image generation.

Installation

Prerequisites

Before you begin, ensure you have the following installed:

Python 3.7+
CUDA-enabled GPU (recommended for faster training)
pip package installer

Steps

Clone the repository:

git clone https://github.com/fiidalgo/cs2420.git
cd cs2420

Create a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Linux/macOS
# venv\Scripts\activate  # On Windows

Install the required dependencies:
```
pip install -r requirements.txt
```

Project Overview

This project explores the application of Stable Diffusion and CLIP models to the task of generating images from forensic sketches. The goal is to improve the quality and accuracy of generated images by fine-tuning these models.

Models Used

Stable Diffusion (Hugging Face runwayml/stable-diffusion-v1-5): The project utilizes the runwayml/stable-diffusion-v1-5 Stable Diffusion model from Hugging Face. This model is a latent text-to-image diffusion model capable of generating photorealistic images given any text input. It was used as the base model for fine-tuning and testing.
CLIP (Contrastive Language-Image Pre-training): CLIP is a neural network trained on a variety of (image, text) pairs. It can be used to understand the relationship between images and text. In this project, CLIP is used in conjunction with Stable Diffusion to guide the image generation process. The specific CLIP model used is implicitly defined by the Stable Diffusion pipeline.

Experiment Setup

The project investigates the performance of the following setups:

Base Stable Diffusion Model: The performance of the pre-trained runwayml/stable-diffusion-v1-5 model is evaluated directly for generating images from forensic sketch descriptions.
CLIP -> Stable Diffusion Pipeline: A pipeline is constructed where CLIP is used to encode the text description of the forensic sketch, and this encoding is then used to guide the Stable Diffusion model in generating the image.
Fine-Tuned CLIP -> Stable Diffusion Pipeline (with LoRA): This is the core of the project. Both the CLIP and Stable Diffusion models are fine-tuned using LoRA (Low-Rank Adaptation). LoRA allows for efficient fine-tuning by only training a small number of parameters.
- Ablation Study: Before fine-tuning, an ablation study was conducted to determine the optimal layers within both CLIP and Stable Diffusion to apply LoRA. The directories ablation_study_both, ablation_study_cross_only, and ablation_study_self_only contain the results and code related to this study. The goal was to identify the layers that contribute most to performance when fine-tuned.

Key Features

CLIP Fine-tuning: clip_fine_tune.py provides scripts for fine-tuning the CLIP model.
Stable Diffusion Fine-tuning: sd_fine_tune.py allows fine-tuning of the Stable Diffusion model.
Testing Scripts: clip_sd_test.py and sd_test.py contain scripts for testing the performance of the fine-tuned models.
Ablation Studies: The repository includes directories (ablation_study_both, ablation_study_cross_only, ablation_study_self_only) for conducting ablation studies to analyze the impact of different components.
Metrics Analysis: final_metrics.ipynb is a Jupyter Notebook for analyzing and visualizing the final metrics of the experiments.

Troubleshooting

CUDA issues: Ensure you have the correct CUDA drivers installed and that your environment is configured to use the GPU.
Dependency conflicts: Double-check that all dependencies are installed correctly and that there are no version conflicts. Try upgrading pip if you encounter issues.
Memory errors: Fine-tuning large models can be memory-intensive. Try reducing the batch size or using gradient accumulation to reduce memory usage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

cs2420

Installation

Prerequisites

Steps

Project Overview

Models Used

Experiment Setup

Key Features

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ablation_study_both		ablation_study_both
ablation_study_cross_only		ablation_study_cross_only
ablation_study_self_only		ablation_study_self_only
data		data
training_plots		training_plots
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
clip_fine_tune.py		clip_fine_tune.py
clip_sd_test.py		clip_sd_test.py
final_metrics.ipynb		final_metrics.ipynb
report.pdf		report.pdf
requirements.txt		requirements.txt
sd_fine_tune.py		sd_fine_tune.py
sd_test.py		sd_test.py

fiidalgo/forensic-sketch-ai

Folders and files

Latest commit

History

Repository files navigation

cs2420

Installation

Prerequisites

Steps

Project Overview

Models Used

Experiment Setup

Key Features

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages