Skip to content

A high-resolution (256x256) diffusion model for generating artwork in the style of Japanese Wooblock Prints with TPU/GPU support.

Notifications You must be signed in to change notification settings

ashish493/printScape-Diffusion

Repository files navigation

PrintScape Diffusion

A diffusion model for generating artwork in the style of Kawase Hasui prints.

Project Structure

printScape-Diffusion/
│
├── download_kawase_hasui_images.py   # Original script for downloading images
├── prepare_dataset.py                # Script to prepare the dataset
├── model.py                          # U-Net model with attention and residual blocks
├── diffusion.py                      # Diffusion process implementation (DDIM)
├── dataset.py                        # Dataset class for loading images
├── train.py                          # Training script
├── generate.py                       # Inference script to generate new images
│
├── downloaded_images/                # Raw downloaded images
├── dataset/                          # Processed dataset (created by prepare_dataset.py)
├── checkpoints/                      # Model checkpoints
├── generated/                        # Generated images
└── samples/                          # Sample images during training

Setup

  1. Install the required packages:
pip install -r requirements.txt
  1. Download Kawase Hasui images using the provided script:
python download_kawase_hasui_images.py --output_dir downloaded_images

This will save the raw images to the downloaded_images/ folder.

  1. Process the downloaded images to create the training dataset:
python prepare_dataset.py

Training

To train the diffusion model, you can choose between high-resolution (256x256) and low-resolution (96x96 or 64x64) workflows. The model and training script automatically adjust for memory efficiency and quality.

High-Resolution Training (256x256)

For best quality (requires more VRAM, recommended 12GB+):

python train.py --data_dir dataset_256 --img_size 256 --base_channels 24 --batch_size 8 --epochs 2500 --save_every 10 --generate_samples --mixed_precision

This uses a UNet with 24 base channels and memory-efficient attention. You can adjust --batch_size and --gradient_accumulation for your GPU.

Memory-Optimized Training (96x96)

For GPUs with limited memory (4-8GB):

python train.py --data_dir dataset --img_size 96 --base_channels 16 --batch_size 2 --gradient_accumulation 4 --mixed_precision

Ultra-Low Memory Training (64x64)

For GPUs with very limited memory (4GB or less):

python prepare_dataset.py --img_size 64
python train.py --data_dir dataset --img_size 64 --base_channels 16 --batch_size 1 --gradient_accumulation 8 --mixed_precision

Additional options:

  • --resume: Resume training from the latest checkpoint
  • --batch_size: Set batch size (default: 4)
  • --gradient_accumulation: Number of gradient accumulation steps (default: 2)
  • --mixed_precision: Enable mixed precision training to reduce memory usage
  • --img_size: Size of images to train on (default: 96 for low-res, 256 for high-res)
  • --base_channels: Set base channels for UNet (16 for low-res, 24 for high-res)
  • --lr: Set learning rate (default: 3e-4)
  • --device: Set device (default: cuda)

Generating Images

To generate new images from random noise using a trained model:

python generate.py --checkpoint checkpoints/latest.pt --num_images 16

Additional options:

  • --timesteps: Number of sampling timesteps (default: 100, fewer steps is faster but lower quality)
  • --batch_size: Batch size for generation (default: 4)
  • --save_grid: Save images as a grid as well as individually
  • --output_dir: Directory to save generated images (default: "generated")

Training Progress Visualization

To visually compare outputs from different training epochs, you can generate a grid image:

python generate_grid_from_checkpoints.py --checkpoints_dir checkpoints --output_dir generated --img_size 256 --timesteps 100

Example training progress (epochs 399, 739, 1359, 1739):

Training Progress

This grid shows how the model's outputs evolve as training progresses.

Model Details

  • U-Net architecture with configurable base channels (16 for low-res, 24 for high-res)
  • Memory-efficient attention (group normalization, depthwise separable convolutions)
  • Residual blocks with adaptable normalization
  • Sinusoidal time embeddings
  • DDIM (Denoising Diffusion Implicit Models) sampling for faster generation
  • Mixed precision and gradient accumulation for memory efficiency
  • Trained on images of Kawase Hasui's artwork

Memory Optimization Techniques

  • Mixed precision training (FP16)
  • Gradient accumulation
  • Memory-efficient attention
  • Periodic memory cleanup
  • Smaller batch sizes
  • Gradient clipping
  • Efficient UNet architecture with fewer channels in deeper layers

Switching Between High-Res and Low-Res

Use --img_size and --base_channels to select your workflow. For high-res, use 256/24; for low-res, use 96/16 or 64/16.

About

A high-resolution (256x256) diffusion model for generating artwork in the style of Japanese Wooblock Prints with TPU/GPU support.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages