Add train_sana_sprint_diffusers file #251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Draft

scxue wants to merge 3 commits into main from add-sana-sprint-train-diffusers

Collaborator

scxue commented Apr 7, 2025

Initial implementation of SANA-Sprint training script adapted for Diffusers.
This needs further refinement and optimization. @lawrence-cj @sayakpaul


          Add train_sana_sprint_diffusers file

3c7d74e

lawrence-cj added the lint wanted label


          [CI-Lint] Fix code style issues with pre-commit 84665d7

09bdb97

github-actions bot removed the lint wanted label

Contributor

sayakpaul commented Apr 7, 2025

Will review in a bit.

sayakpaul reviewed

View reviewed changes

Contributor

sayakpaul left a comment

Looking really promising. I left some comments, LMK if they make sense.

Additionally, if we could wrap the loss computations for the different phases into different functions, I think that will be easier to read. LMK what you think.

train_scripts/train_sana_sprint_diffusers.py

		@@ -0,0 +1,1823 @@
		#!/usr/bin/env python
		# Copyright 2025 The HuggingFace Inc. team. All rights reserved.

Contributor

sayakpaul Apr 8, 2025

Feel free to add SANA Sprint team here too :)

train_scripts/train_sana_sprint_diffusers.py

+              if is_torch_npu_available():
+                  torch.npu.config.allow_internal_format = False
+              complex_human_instruction = [

Contributor

sayakpaul Apr 8, 2025

Suggested change

      
            complex_human_instruction = [
          
            COMPLEX_HUMAN_INSTRUCTION = [

train_scripts/train_sana_sprint_diffusers.py

		return False


		class Text2ImageDataset:

Contributor

sayakpaul Apr 8, 2025

Do we have an example dataset with which it would work?

train_scripts/train_sana_sprint_diffusers.py

+                      )
+                      # add meta-data to dataloader instance for convenience
+                      self._train_dataloader.num_batches = num_batches
+                      self._train_dataloader.num_samples = num_samples

Contributor

sayakpaul Apr 8, 2025

Could use num_train_examples here no?

train_scripts/train_sana_sprint_diffusers.py Outdated Show resolved Hide resolved

train_scripts/train_sana_sprint_diffusers.py

+                              disc.eval()
+                              models_to_accumulate = [transformer]
+                              with accelerator.accumulate(models_to_accumulate):
+                                  with torch.no_grad():

Contributor

sayakpaul Apr 8, 2025

We can then remove this context manager.

train_scripts/train_sana_sprint_diffusers.py

+                              images = None
+                              del pipeline
+                  # Save the lora layers

Contributor

sayakpaul Apr 8, 2025

We are not doing LoRA. So, this can be safely omitted.

train_scripts/train_sana_sprint_diffusers.py

+                                      cfg_y = torch.cat([negative_prompt_embeds, prompt_embeds], dim=0)
+                                      cfg_y_mask = torch.cat([negative_prompt_attention_mask, prompt_attention_mask], dim=0)
+                                      cfg_pretrain_pred = pretrained_model(

Contributor

sayakpaul Apr 8, 2025

As another optimization, we could keep the pretrained_model in CPU once this computation is done and load to GPU again when needed.

train_scripts/train_sana_sprint_diffusers.py

+                                      phase = "G"
+                                      optimizer_D.step()
+                                      optimizer_D.zero_grad(set_to_none=True)

Contributor

sayakpaul Apr 8, 2025

I think set_to_none is by default True.

train_scripts/train_sana_sprint_diffusers.py

+                                      lr_scheduler.step()
+                                      optimizer_G.zero_grad(set_to_none=True)
+                          elif phase == "D":

Contributor

sayakpaul Apr 8, 2025

So this alternates between two phases in the same training step, right? If so, I would add a comment.

Also, should we let the users control the step interval in which the discriminator should be updated? Or not really?


          Update train_scripts/train_sana_sprint_diffusers.py

15eadfc

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

Collaborator Author

scxue commented Apr 9, 2025

Thanks for your thorough review and helpful suggestions! I'll carefully go through them and incorporate the changes when I'm back. Really appreciate it!

Contributor

sayakpaul commented Apr 10, 2025

Please don't hesitate to ping me for running tests, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet