Skip to content

salamnocap/ml-figs-ldm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hugging Face

ML-FIGS-LDM

EDUCATIONAL FIGURE GENERATION USING TEXT PERCEPTUAL LOSS

Diffusion

ML-FIGS-LDM is a Latent Diffusion Model (LDM) for generating educational figures. The AutoencoderKL is trained using a Text Perceptual Loss to reconstruct more readable text within the figures.

Dataset ML-Figs

We present the ML-Figs dataset, a comprehensive collection of 4,302 figures and captions extracted from 43 machine learning books. This dataset is designed to advance research in understanding and interpreting educational materials. It includes 4000 samples for training and 302 for testing.

Expanded Dataset Ml-Figs-SciCap

To improve the coverage and diversity of our datasets, we decided to expand the ML-Figs dataset by adding extra figures and captions from the SciCap dataset, particularly those from ACL papers. This expansion ML-Figs + SciCap has boosted the total size of our dataset to an impressive 19,514 samples.

Text Perceptual Loss

The Text Perceptual Loss calculates the perceptual similarity between the text regions of two images by extracting text bounding boxes. The mean squared error (MSE) loss is then computed for each corresponding text region. The final loss is the average of these individual region losses. Text Perceptual Loss (TPL)

Install Dependencies:

pip install -r requirements.txt

or create a conda environment:

conda env create -f environment.yaml
conda activate ml-figs-ldm
pip install -e .

Update albumentations package:

python scripts/update_albm_package.py

Train LDM (Ml-Figs + SciCap ACL):

python main.py --base configs/ml-figs-scicap-ldm.yaml --train=True --scale_lr=False

Train VAE (Ml-Figs + SciCap ACL):

python main.py --base configs/ml-figs-scicap-vae.yaml --train=True

Evaluate LDM:

python scripts/eval_ldm.py
python scripts/eval_FID_ldm.py

Evaluate VAE:

python main.py scripts/eval_vae.py

Qualitative Results:

Qualitative Comparison of Autoencoder Models:

Model A trained on ML-Figs, Model B trained on ML-Figs + SciCap. TPL: Text Perceptual Loss. SD refers to Stable Diffusion v1-4 trained on LAION.

Qualitative Comparison

Generated samples across varying classifier-free guidance (CFG) scales:

Generated Samples

Download Models:

Autoencoder and LDM models are available for download at huggingface.co/salamnocap/ml-figs-ldm. The models are trained on the ML-Figs+SciCap dataset.

About

EDUCATIONAL FIGURE GENERATION USING TEXT PERCEPTUAL LOSS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published