The Ghibli Fine-Tuned Stable Diffusion 2.1 project is a cutting-edge endeavor that harnesses the power of deep learning to generate images in the enchanting and iconic art style of Studio Ghibli. By fine-tuning the Stable Diffusion 2.1 model, this project enables the creation of visually stunning images that capture the vibrant colors, intricate details, and whimsical charm of Ghibli films. The repository includes a meticulously crafted Jupyter notebook for training, an interactive Gradio demo for real-time image generation, and comprehensive instructions for setup and usage. Designed for data scientists, developers, and Ghibli enthusiasts, this project bridges technology and artistry with unparalleled precision.
The cornerstone of this project is the Jupyter notebook located at notebooks/ghibli-sd-2.1-base-finetuning.ipynb
. This notebook provides a step-by-step guide to fine-tuning the Stable Diffusion 2.1 Base
model using the Ghibli dataset, complete with code, explanations, and best practices. It is designed to be accessible to both beginners and experienced practitioners, offering flexibility to replicate the training process or experiment with custom modifications. The notebook is compatible with the following platforms:
The foundation of this project is the Jupyter notebook found at notebooks/ghibli-sd-2.1-lora.ipynb
. It offers a clear, step-by-step walkthrough for fine-tuning the Stable Diffusion 2.1
model with the Ghibli dataset using LoRA (Low-Rank Adaptation), including code, detailed notes, and practical tips. Crafted for both novices and seasoned users, it supports easy replication of the training process or experimentation with custom tweaks. The notebook is compatible with the following platforms:
To get started, open the notebook in your preferred platform and follow the instructions to set up the environment and execute the training process.
Each task employs a dedicated dataset hosted on HuggingFace, tailored to support the unique requirements of the training process while reflecting Ghibli’s distinctive artistry:
- Full Fine-Tuning Task: Utilizes
, a comprehensive collection of high-quality Ghibli-inspired images designed for thorough model fine-tuning, ensuring rich and authentic visual outputs.
- LoRA Task: Leverages
, a lightweight and optimized dataset crafted for LoRA adaptation, enabling efficient training with reduced computational resources while maintaining the charm of Ghibli’s style.
The project uses carefully selected Stable Diffusion models for each task, balancing quality, efficiency, and alignment with Ghibli’s artistic vision:
-
Full Fine-Tuning Task: Built on
, a powerful base model ideal for extensive fine-tuning, producing detailed and faithful Ghibli-style artwork with high fidelity.
-
LoRA Task: Based on
, a versatile model optimized for LoRA, offering a streamlined approach for rapid experimentation and efficient generation of Ghibli-inspired visuals.
License-Plate-Detector-OCR uses computer vision, OCR to detect, read license plates:
To run the Gradio app locally (localhost:7860
):
python apps/gradio_app.py
Clone the project repository and navigate to the project directory:
git clone https://github.com/danhtran2mind/Ghibli-Stable-Diffusion-Synthesis.git
cd Ghibli-Stable-Diffusion-Synthesis
Install Dependencies using requirements.txt
pip install -r requirements/requirements.txt
Run the following scripts to set up the project:
python scripts/download_ckpts.py
python scripts/download_datasets.py
Refer to the Scripts Documents for detailed arguments used in Scripts. ⚙️
The Training Notebooks, available at Training Notebooks, offer a comprehensive guide to both the Full Fine-tuning and LoRA training methods.
To use local datasets downloaded from Hugging Face Datasets, replace --dataset_name
with the following in the specified notebooks:
- In
notebooks/ghibli-sd-2.1-base-finetuning.ipynb
, replace--dataset_name="uwunish/ghibli-dataset"
with by--dataset_name="data/uwunish-ghibli-dataset"
. - In
notebooks/ghibli-sd-2.1-lora.ipynb
, replace--dataset_name="pulnip/ghibli-dataset"
with--dataset_name="data/pulnip-ghibli-dataset"
.
For more information about Training, you can see Stable Diffusion text-to-image fine-tuning.
- To generate iamge using the
Full Fine-tuning
model:
python src/ghibli_stable_diffusion_synthesis/infer.py \
--method full_finetuning \
--prompt "donald trump in ghibli style" \
--height 512 --width 512 \
--num_inference_steps 50 \
--guidance_scale 3.5 \
--seed 42 \
--output_path "tests/test_data/ghibli_style_output_full_finetuning.png"
- To run inference with
LoRA
:
python src/ghibli_stable_diffusion_synthesis/infer.py \
--method lora \
--prompt "a beautiful city in Ghibli style" \
--height 720 --width 1280 \
--num_inference_steps 100 \
--guidance_scale 15.5 \
--seed 42 \
--lora_scale 0.7 \
--output_path "tests/test_data/ghibli_style_output_lora.png"
Refer to the Inference Documents for detailed arguments used in Inference. ⚙️
- Python: 3.10 or higher
- Key Libraries: See requirements_compatible.txt for compatible versions
For questions, issues, please contact the maintainer via the Issues tab on GitHub.