The official code of "Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation".
python prepare_nulltext_checkpoint.pypython nulltext_unet.pySee generation_with_nulltext_model.py for details.
The core code of GCFG are as follows:
noise_pred_text = self.unet(
                    latent_model_input[1:],
                    t,
                    encoder_hidden_states=prompt_embeds[2:3],
                    cross_attention_kwargs=cross_attention_kwargs,
                    return_dict=False,
                )[0]
noise_pred_text_ori = self.unet1(
    latent_model_input[1:],
    t,
    encoder_hidden_states=prompt_embeds[3:4],
    cross_attention_kwargs=cross_attention_kwargs,
    return_dict=False,
)[0]
noise_pred_uncond = self.unet0(
    latent_model_input[:1],
    t,
    encoder_hidden_states=prompt_embeds[:1],
    cross_attention_kwargs=cross_attention_kwargs,
    return_dict=False,
)[0]
# perform guidance
if do_classifier_free_guidance:
    noise_pred = noise_pred_uncond + \
                 guidance_scale * (noise_pred_text - noise_pred_uncond) + \
                 guidance_scale_ori * (noise_pred_text_ori - noise_pred_uncond)where self.unet0 is SD1.5 for unconditional guidance, self.unet is in-domain diffusion model for domain guidance, and self.unet1 is SD1.5 or customized SD for control guidance.
- Updating training codes in UniDiffusion.
- Results on SDXL and SD3.
