ML-FIGS-LDM

EDUCATIONAL FIGURE GENERATION USING TEXT PERCEPTUAL LOSS

ML-FIGS-LDM is a Latent Diffusion Model (LDM) for generating educational figures. The AutoencoderKL is trained using a Text Perceptual Loss to reconstruct more readable text within the figures.

Dataset ML-Figs

We present the ML-Figs dataset, a comprehensive collection of 4,302 figures and captions extracted from 43 machine learning books. This dataset is designed to advance research in understanding and interpreting educational materials. It includes 4000 samples for training and 302 for testing.

Expanded Dataset Ml-Figs-SciCap

To improve the coverage and diversity of our datasets, we decided to expand the ML-Figs dataset by adding extra figures and captions from the SciCap dataset, particularly those from ACL papers. This expansion ML-Figs + SciCap has boosted the total size of our dataset to an impressive 19,514 samples.

Text Perceptual Loss

The Text Perceptual Loss calculates the perceptual similarity between the text regions of two images by extracting text bounding boxes. The mean squared error (MSE) loss is then computed for each corresponding text region. The final loss is the average of these individual region losses. Text Perceptual Loss (TPL)

Install Dependencies:

pip install -r requirements.txt

or create a conda environment:

conda env create -f environment.yaml
conda activate ml-figs-ldm
pip install -e .

Update albumentations package:

python scripts/update_albm_package.py

Train LDM (Ml-Figs + SciCap ACL):

python main.py --base configs/ml-figs-scicap-ldm.yaml --train=True --scale_lr=False

Train VAE (Ml-Figs + SciCap ACL):

python main.py --base configs/ml-figs-scicap-vae.yaml --train=True

Evaluate LDM:

python scripts/eval_ldm.py
python scripts/eval_FID_ldm.py

Evaluate VAE:

python main.py scripts/eval_vae.py

Qualitative Results:

Qualitative Comparison of Autoencoder Models:

Model A trained on ML-Figs, Model B trained on ML-Figs + SciCap. TPL: Text Perceptual Loss. SD refers to Stable Diffusion v1-4 trained on LAION.

Generated samples across varying classifier-free guidance (CFG) scales:

Download Models:

Autoencoder and LDM models are available for download at huggingface.co/salamnocap/ml-figs-ldm. The models are trained on the ML-Figs+SciCap dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
configs		configs
ldm		ldm
scripts		scripts
.gitignore		.gitignore
README.md		README.md
environment.yaml		environment.yaml
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ML-FIGS-LDM

EDUCATIONAL FIGURE GENERATION USING TEXT PERCEPTUAL LOSS

Dataset ML-Figs

Expanded Dataset Ml-Figs-SciCap

Text Perceptual Loss

Install Dependencies:

Train LDM (Ml-Figs + SciCap ACL):

Train VAE (Ml-Figs + SciCap ACL):

Evaluate LDM:

Evaluate VAE:

Qualitative Results:

Qualitative Comparison of Autoencoder Models:

Generated samples across varying classifier-free guidance (CFG) scales:

Download Models:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

salamnocap/ml-figs-ldm

Folders and files

Latest commit

History

Repository files navigation

ML-FIGS-LDM

EDUCATIONAL FIGURE GENERATION USING TEXT PERCEPTUAL LOSS

Dataset ML-Figs

Expanded Dataset Ml-Figs-SciCap

Text Perceptual Loss

Install Dependencies:

Train LDM (Ml-Figs + SciCap ACL):

Train VAE (Ml-Figs + SciCap ACL):

Evaluate LDM:

Evaluate VAE:

Qualitative Results:

Qualitative Comparison of Autoencoder Models:

Generated samples across varying classifier-free guidance (CFG) scales:

Download Models:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages