A diffusion model for generating artwork in the style of Kawase Hasui prints.
printScape-Diffusion/
│
├── download_kawase_hasui_images.py # Original script for downloading images
├── prepare_dataset.py # Script to prepare the dataset
├── model.py # U-Net model with attention and residual blocks
├── diffusion.py # Diffusion process implementation (DDIM)
├── dataset.py # Dataset class for loading images
├── train.py # Training script
├── generate.py # Inference script to generate new images
│
├── downloaded_images/ # Raw downloaded images
├── dataset/ # Processed dataset (created by prepare_dataset.py)
├── checkpoints/ # Model checkpoints
├── generated/ # Generated images
└── samples/ # Sample images during training
- Install the required packages:
pip install -r requirements.txt
- Download Kawase Hasui images using the provided script:
python download_kawase_hasui_images.py --output_dir downloaded_images
This will save the raw images to the downloaded_images/
folder.
- Process the downloaded images to create the training dataset:
python prepare_dataset.py
To train the diffusion model, you can choose between high-resolution (256x256) and low-resolution (96x96 or 64x64) workflows. The model and training script automatically adjust for memory efficiency and quality.
For best quality (requires more VRAM, recommended 12GB+):
python train.py --data_dir dataset_256 --img_size 256 --base_channels 24 --batch_size 8 --epochs 2500 --save_every 10 --generate_samples --mixed_precision
This uses a UNet with 24 base channels and memory-efficient attention. You can adjust --batch_size
and --gradient_accumulation
for your GPU.
For GPUs with limited memory (4-8GB):
python train.py --data_dir dataset --img_size 96 --base_channels 16 --batch_size 2 --gradient_accumulation 4 --mixed_precision
For GPUs with very limited memory (4GB or less):
python prepare_dataset.py --img_size 64
python train.py --data_dir dataset --img_size 64 --base_channels 16 --batch_size 1 --gradient_accumulation 8 --mixed_precision
--resume
: Resume training from the latest checkpoint--batch_size
: Set batch size (default: 4)--gradient_accumulation
: Number of gradient accumulation steps (default: 2)--mixed_precision
: Enable mixed precision training to reduce memory usage--img_size
: Size of images to train on (default: 96 for low-res, 256 for high-res)--base_channels
: Set base channels for UNet (16 for low-res, 24 for high-res)--lr
: Set learning rate (default: 3e-4)--device
: Set device (default: cuda)
To generate new images from random noise using a trained model:
python generate.py --checkpoint checkpoints/latest.pt --num_images 16
Additional options:
--timesteps
: Number of sampling timesteps (default: 100, fewer steps is faster but lower quality)--batch_size
: Batch size for generation (default: 4)--save_grid
: Save images as a grid as well as individually--output_dir
: Directory to save generated images (default: "generated")
To visually compare outputs from different training epochs, you can generate a grid image:
python generate_grid_from_checkpoints.py --checkpoints_dir checkpoints --output_dir generated --img_size 256 --timesteps 100
Example training progress (epochs 399, 739, 1359, 1739):
This grid shows how the model's outputs evolve as training progresses.
- U-Net architecture with configurable base channels (16 for low-res, 24 for high-res)
- Memory-efficient attention (group normalization, depthwise separable convolutions)
- Residual blocks with adaptable normalization
- Sinusoidal time embeddings
- DDIM (Denoising Diffusion Implicit Models) sampling for faster generation
- Mixed precision and gradient accumulation for memory efficiency
- Trained on images of Kawase Hasui's artwork
- Mixed precision training (FP16)
- Gradient accumulation
- Memory-efficient attention
- Periodic memory cleanup
- Smaller batch sizes
- Gradient clipping
- Efficient UNet architecture with fewer channels in deeper layers
Use --img_size
and --base_channels
to select your workflow. For high-res, use 256/24; for low-res, use 96/16 or 64/16.