Support Lumina-image-2.0 #1927

sdbds · 2025-02-12T08:31:45Z

Still in preparation.

After checking their sampler using flux and vae, the textencoder part uses google's gemma2

rockerBOO · 2025-02-13T18:31:11Z

I got this setup locally, I know it's not ready for anything but I want to get it working. Let me know if you want to work together on this. I can help with some of the model loading parts which is where I got stuck with after poking at it. If you are progressed past this, I can help wherever else or just testing.

Thanks.

sdbds · 2025-02-15T09:12:45Z

I got this setup locally, I know it's not ready for anything but I want to get it working. Let me know if you want to work together on this. I can help with some of the model loading parts which is where I got stuck with after poking at it. If you are progressed past this, I can help wherever else or just testing.

Thanks.

Thank you, the framework is basically set up at the moment, but there is still some room for improvement in the caching strategy.

I think I can discuss with @kohya-ss whether to continue using the previous method.

#1924 (comment)

envy-ai · 2025-02-15T16:57:15Z

Thank you, the framework is basically set up at the moment, but there is still some room for improvement in the caching strategy.

Does that mean I can download your fork and test it now?

rockerBOO · 2025-02-15T19:17:33Z

It's still not quite working but I'm working through some issues at the moment. Mostly with model loading but will see what else is needed after that. It is fairly barebones so wouldn't expect it to be in working state just yet.

… code

Lumina 2 and Gemma 2 model loading

# Conflicts: # library/lumina_models.py

Lumina cache checkpointing

sdbds · 2025-02-17T11:04:00Z

After multiple updates, the project can now run under limited conditions:

Flash_attn on Windows will cause NAN, so it must be run in a Linux environment.
Later consideration will be given to transforming it into SDP or xformers-driven
The POS ID calculation for token sequences is not padded to the max length, which leads to the necessity of batchsize = 1

Samples attention

kohya-ss · 2025-02-19T13:06:07Z

Regarding strategy, I would like you to proceed as is. I would like to refactor it together with other architectures later.

The script seems to assume that the model file is .safetensors, but I could only find .pth: https://huggingface.co/Alpha-VLLM/Lumina-Image-2.0/tree/main

I would appreciate it if you could tell me where .safetensors is.

rockerBOO · 2025-02-19T17:57:28Z

I converted their consolidated.00-of-01.pth here https://huggingface.co/rockerBOO/lumina-image-2/blob/main/lumina-image-2.safetensors

kohya-ss · 2025-05-25T02:22:32Z

I'm sorry this is so late. I am testing the training, but the sample image seems to be a black image even with --sample_at_first, and the loss is also NaN. Can you give me some hints?

Lumina checkpoint is download from https://huggingface.co/rockerBOO/lumina-image-2/tree/main, and Gemma2 and AE are download from https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/tree/main/split_files.

The command is:

 accelerate launch  --mixed_precision bf16 --num_cpu_threads_per_process 1 lumina_train_network.py 
    --pretrained_model_name_or_path path\to\lumina-2.0\lumina-image-2.safetensors  
    --gemma2 path\to\lumina-2.0\gemma_2_2b_fp16.safetensors --ae path\to\lumina-2.0\ae.safetensors 
    --cache_latents_to_disk --save_model_as safetensors --sdpa --persistent_data_loader_workers --max_data_loader_n_workers 2 
    --seed 42 --mixed_precision bf16 --save_precision bf16 
    --network_module networks.lora_lumina --network_dim 4 
    --optimizer_type adamw8bit --learning_rate 1e-4  --gradient_checkpointing --highvram 
    --max_train_epochs 8 --save_every_n_epochs 1 
    --dataset_config path\to\dataset_config.toml --output_dir path\to\output\lora --output_name lumina-test-1 
    --sample_prompts=path\to\prompts.txt --sample_every_n_epochs 1 --vae_batch_size 4 --sample_at_first

sdbds · 2025-05-25T03:55:08Z

I'm sorry this is so late. I am testing the training, but the sample image seems to be a black image even with --sample_at_first, and the loss is also NaN. Can you give me some hints?

Lumina checkpoint is download from https://huggingface.co/rockerBOO/lumina-image-2/tree/main, and Gemma2 and AE are download from https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/tree/main/split_files.

The command is:
 accelerate launch  --mixed_precision bf16 --num_cpu_threads_per_process 1 lumina_train_network.py 
    --pretrained_model_name_or_path path\to\lumina-2.0\lumina-image-2.safetensors  
    --gemma2 path\to\lumina-2.0\gemma_2_2b_fp16.safetensors --ae path\to\lumina-2.0\ae.safetensors 
    --cache_latents_to_disk --save_model_as safetensors --sdpa --persistent_data_loader_workers --max_data_loader_n_workers 2 
    --seed 42 --mixed_precision bf16 --save_precision bf16 
    --network_module networks.lora_lumina --network_dim 4 
    --optimizer_type adamw8bit --learning_rate 1e-4  --gradient_checkpointing --highvram 
    --max_train_epochs 8 --save_every_n_epochs 1 
    --dataset_config path\to\dataset_config.toml --output_dir path\to\output\lora --output_name lumina-test-1 
    --sample_prompts=path\to\prompts.txt --sample_every_n_epochs 1 --vae_batch_size 4 --sample_at_first

Most flash_attn on Windows lack compiled training backends, causing usage to result in NAN.
Use the version I compiled directly, or compile a version with training backends yourself.
https://github.com/sdbds/flash-attention-for-windows/releases

kohya-ss · 2025-05-25T04:29:27Z

Thank you, I understand. So the --use_flash_attn option is required. The sample image is successfully generated, but the loss goes NaN at the first step. I'm using Flash Attention which is a same as the on used in Musubi Tuner repo, so it will be fine.

I got a following warning. Is it ok?: sd-scripts\library\lumina_models.py:51: UserWarning: Cannot import apex RMSNorm, switch to vanilla implementation

rockerBOO · 2025-05-25T04:40:38Z

If you use Pytorch 2.6 I believe SDPA works correctly (which is the default).

UserWarning: Cannot import apex RMSNorm, switch to vanilla implementation

I think apex is https://github.com/NVIDIA/apex. So this warning is fine unless you wanted to use that library. We may be able to get rid of the warning but it was in the original implementation.

kohya-ss · 2025-05-25T07:39:37Z

Thank you! It is true that NaNs occur with SDPA in PyTorch 2.4 (venv with requirements.txt of the sd3 branch), but NaNs do not seem to occur in PyTorch 2.6 (venv with Musubi Tuner). Do you know the reason for this? And should we move to PyTorch 2.6?

sdbds · 2025-05-25T08:05:10Z

Thank you! It is true that NaNs occur with SDPA in PyTorch 2.4 (venv with requirements.txt of the sd3 branch), but NaNs do not seem to occur in PyTorch 2.6 (venv with Musubi Tuner). Do you know the reason for this? And should we move to PyTorch 2.6?

I guess it's a bug in 2.4, and the migration to 2.7 didn't cause any issues... Currently, 2.7 hasn't encountered any bugs, and it can support 50xx GPUs as soon as possible.

kohya-ss · 2025-05-25T09:34:17Z

Thank you, I think the code in sd3 branch will work with PyTorch 2.6 or later, but some testing may be better. It might be a good idea to make it clear in the Lumina documentation that PyTorch 2.6 or later is required.

rockerBOO · 2025-05-25T21:37:09Z

Thank you! It is true that NaNs occur with SDPA in PyTorch 2.4 (venv with requirements.txt of the sd3 branch), but NaNs do not seem to occur in PyTorch 2.6 (venv with Musubi Tuner). Do you know the reason for this? And should we move to PyTorch 2.6?

Something specific to their architecture but since the new version resolved it I didn't look further. They had some stuff in the code for bf16 and using flash attention but I reworked it so it used SDPA and then flash attention was a specific opt-in.

kohya-ss

Sorry for the late review. I think I've found a potential cause of NaN issues in PyTorch 2.4. I think it may be better to address this cause for future proof. What do you think?

kohya-ss · 2025-06-04T12:37:55Z

library/lumina_models.py

+
+        # Refine image context
+        for layer in self.noise_refiner:
+            x = layer(x, x_mask, img_freqs_cis, t)


At this point, x_mask is zero values (set in line 1113).

Hmm this might be identifying a separate bug with how this was refactored. Can see it upstream: https://github.com/Alpha-VLLM/Lumina-Image-2.0/blob/main/models/model.py#L730-L733 I can give it another look through to fix this though.

Fixing the zero issue with the attention is probably also a good idea.

kohya-ss · 2025-06-04T12:38:58Z

library/lumina_models.py

+                if valid_indices.numel() == 0:
+                    # If all tokens are masked, create a zero output
+                    batch_output = torch.zeros(
+                        seqlen, self.n_local_heads, self.head_dim, 
+                        device=q.device, dtype=q.dtype
+                    )


If x_mask is zero values, sage_attn returns zero values of the expected output shape.

kohya-ss · 2025-06-04T12:42:13Z

library/lumina_models.py

+                self.attention_processor(
+                    xq.permute(0, 2, 1, 3),
+                    xk.permute(0, 2, 1, 3),
+                    xv.permute(0, 2, 1, 3),
+                    attn_mask=x_mask.bool().view(bsz, 1, 1, seqlen).expand(-1, self.n_local_heads, seqlen, -1),
+                    scale=softmax_scale,
+                )


It seems that SDPA may return NaN if the mask is zero values. With SDPA, it seems that we need to write code that returns zero values with the expected output shape when the mask is zero values, like sage_attn.

Perhaps this will work (do not work if some of items of the batch are all-zero and some are not):

valid_indices = torch.nonzero(x_mask, as_tuple=False).squeeze(-1) if valid_indices.numel() == 0: # If all tokens are masked, create a zero output # NOTE: This does not assume that there will be a mix of masked and unmasked items in the batch. output = torch.zeros_like(xq, dtype=dtype) else: output = ( self.attention_processor( xq.permute(0, 2, 1, 3), xk.permute(0, 2, 1, 3), xv.permute(0, 2, 1, 3), attn_mask=x_mask.bool().view(bsz, 1, 1, seqlen).expand(-1, self.n_local_heads, seqlen, -1), scale=softmax_scale, ) .permute(0, 2, 1, 3) .to(dtype) )

sdbds · 2025-06-05T03:48:41Z

Sorry for the late review. I think I've found a potential cause of NaN issues in PyTorch 2.4. I think it may be better to address this cause for future proof. What do you think?

pytorch/pytorch#130014

I suspect it's a similar bug in PyTorch 2.4, and the best solution is to prioritize upgrading the PyTorch version to resolve the issue.
Because PyTorch 2.5 changed the cuDNN backend for SDPA

kohya-ss · 2025-06-05T09:07:22Z

I suspect it's a similar bug in PyTorch 2.4, and the best solution is to prioritize upgrading the PyTorch version to resolve the issue.
Because PyTorch 2.5 changed the cuDNN backend for SDPA

It's true about the PyTorch bug.

However, as rockerBOO pointed out, it seems necessary to create the mask correctly when calling noise_refiner. This will also prevent the effect of the bug in PyTorch.

sdbds · 2025-06-05T09:23:20Z

I suspect it's a similar bug in PyTorch 2.4, and the best solution is to prioritize upgrading the PyTorch version to resolve the issue.
Because PyTorch 2.5 changed the cuDNN backend for SDPA

It's true about the PyTorch bug.

However, as rockerBOO pointed out, it seems necessary to create the mask correctly when calling noise_refiner. This will also prevent the effect of the bug in PyTorch.

I think so, because DIT models of sd3, flux types were rarely used with masks before...

kohya-ss · 2025-06-08T12:16:50Z

It seems that noise_refiner does not work properly without masking, so this part needs to be fixed for correct training. If it is difficult to fix, I will update it myself after merging this PR.

This PR also changes FlowMatchEulerDiscreteScheduler, but I am wondering if this will affect the training of FLUX.1 and SD (whether those training will work correctly). What do you think?

rockerBOO · 2025-06-08T19:18:19Z

It seems that noise_refiner does not work properly without masking, so this part needs to be fixed for correct training. If it is difficult to fix, I will update it myself after merging this PR.

To fix I just need to find the time to do it. The fix be fairly simple (comparing to the upstream version).

This PR also changes FlowMatchEulerDiscreteScheduler, but I am wondering if this will affect the training of FLUX.1 and SD (whether those training will work correctly). What do you think?

FlowMatchEulerDiscreteScheduler was updated to the latest version at the time. This adds the dynamic shifting to this noise scheduler. This was partially done because I didn't understand how sigmas/timesteps worked and was using this noise scheduler to do the similar behavior we have for SD3/Flux with the shifting.

kohya-ss · 2025-06-09T03:44:37Z

FlowMatchEulerDiscreteScheduler was updated to the latest version at the time. This adds the dynamic shifting to this noise scheduler. This was partially done because I didn't understand how sigmas/timesteps worked and was using this noise scheduler to do the similar behavior we have for SD3/Flux with the shifting.

Thank you for clarification!

In my understanding, the dynamic shifting (a new function time_shift added to sd3_train_utils.py) is same as time_shift in lumina_train_util.py.

And this function is called from get_noisy_model_input_and_timesteps in lumina_train_util.py.

So I think we can keep FlowMatchEulerDiscreteScheduler as before, because the dynamic shifting is done in get_noisy_model_input_and_timesteps and time_shift.

Lumina test fix mask

rockerBOO · 2025-06-10T06:22:46Z

I reverted FlowMatchEulerDiscreteScheduler to how it was before. I added the masking in. Because we use buckets that have the same size it doesn't do much on it's own. Would allow you to have latent's of different sizes. Originally it supported a list of tensors.

It's accurate now though.

kohya-ss · 2025-06-12T12:04:42Z

Thank you for updating!

There seems to be a problem with handling the system prompt.

Add a print statement to tokenize method as follows.

        text = [text] if isinstance(text, str) else text
        print(f"Tokenizing: {text}")
        encodings = self.tokenizer(

Use the following dataset settings:

[general]
resolution = [1024, 1024]

[[datasets]]
batch_size = 1
enable_bucket = false
caption_extension = ".txt"

  [[datasets.subsets]]
  image_dir = "/path/to/image_dir"
  num_repeats = 1
  caption_prefix = "1girl, orichara1, "
  system_prompt = "DUMMY SYSTEM PROMPT"

Then you will see the following string:

Tokenizing: ['DUMMY SYSTEM PROMPT <Prompt Start> 1girl, orichara1,  DUMMY SYSTEM PROMPT <Prompt Start> 1girl, breasts, looking at viewer, blush, smile, multiple girls, skirt, shirt, medium breasts, closed mouth, white shirt, short sleeves, pleated skirt, outdoors, sky, shorts, solo focus, day, black skirt, blue sky, umbrella, building, people']

Without caption_prefix, the log is like

Tokenizing: ['DUMMY SYSTEM PROMPT <Prompt Start> DUMMY SYSTEM PROMPT <Prompt Start> 1girl, solo, breasts, looking at viewer, blush, skirt, shirt, closed mouth, white shirt, short sleeves, outdoors, sky, day, looking back, cloud, black skirt, blue sky, plant, building, scenery, city, sign, potted plant, road, power lines, street, utility pole, traffic light, crosswalk, storefront']

Adding system_prompt to the DataSet settings has a large impact, so I would like to avoid it if possible.
For example, how about adding an argument that gives the system_prompt to the training script for lumina, setting it in the tokenizer (or text encoding) strategy, and processing it in the tokenizer strategy?

In addition, if we specify --cache_text_encoder_outputs, the script will stop with the following error.

  File "path\to\sd-scripts\library\lumina_models.py", line 1148, in forward
    x, mask, freqs_cis, l_effective_cap_len, seq_lengths = self.patchify_and_embed(x, cap_feats, cap_mask, t)  
  File "path\to\sd-scripts\library\lumina_models.py", line 1105, in patchify_and_embed
    cap_freqs_cis[i, :cap_len] = freqs_cis[i, :cap_len]
RuntimeError: The expanded size of the tensor (256) must match the existing size (6424135) at non-singleton dimension 0.  Target sizes: [256, 48].  Tensor sizes: [6424135, 48]

library/lumina_models.py

+
+        # Refine image context
+        for layer in self.noise_refiner:
+            x = layer(x, x_mask, img_freqs_cis, t)


kohya-ss · 2025-06-12T12:17:36Z

library/train_util.py

                caption = self.process_caption(subset, image_info.caption)
-                input_ids = [ids[0] for ids in self.tokenize_strategy.tokenize(caption)]  # remove batch dimension
+                input_ids = [ids[0] for ids in self.tokenize_strategy.tokenize(system_prompt + caption)]  # remove batch dimension


There is probably a duplication in adding system_prompt here.

sdbds · 2025-06-14T11:35:37Z

@rockerBOO Can you roll back the dataset settings from before system prompts?

kohya-ss · 2025-06-15T12:13:11Z

If you don't have time to update the system prompt handling, I would like to merge this PR into a new branch and update the masking and system prompt handling there. What do you think?

sdbds · 2025-06-15T15:10:44Z

If you don't have time to update the system prompt handling, I would like to merge this PR into a new branch and update the masking and system prompt handling there. What do you think?

Sorry, I've been quite busy lately. I'd be happy to.

Lumina train util

init

d154e76

sdbds marked this pull request as draft February 12, 2025 08:32

sdbds mentioned this pull request Feb 12, 2025

Support Lumina 2.0 #1924

Open

sdbds added 2 commits February 15, 2025 16:38

update

c0caf33

update lora_lumina

7323ee1

sdbds marked this pull request as ready for review February 15, 2025 09:12

rockerBOO added 2 commits February 15, 2025 14:56

Lumina 2 and Gemma 2 model loading

a00b06b

Add caching gemma2, add gradient checkpointing, refactor lumina model…

60a76eb

… code

This was referenced Feb 16, 2025

Lumina 2 and Gemma 2 model loading sdbds/sd-scripts#12

Merged

Lumina cache checkpointing sdbds/sd-scripts#13

Merged

rockerBOO and others added 6 commits February 16, 2025 01:36

Update metadata.resolution for Lumina 2

1601563

Merge pull request #12 from rockerBOO/lumina-model-loading

6965a01

Lumina 2 and Gemma 2 model loading

update

733fdc0

Merge branch 'lumina' of https://github.com/sdbds/sd-scripts into lumina

3ce23b7

# Conflicts: # library/lumina_models.py

Merge pull request #13 from rockerBOO/lumina-cache-checkpointing

bb7bae5

Lumina cache checkpointing

update for always use gemma2 mask

aa36c48

rockerBOO and others added 7 commits February 17, 2025 12:07

Fix validation epoch divergence

44782dd

Fix sizes for validation split

3365cfa

Clear sizes for validation reg images to be consistent

3ed7606

Fix validation epoch loss to check epoch average

1aa2f00

Add documentation to model, use SDPA attention, sample images

98efbc3

Remove unused attention, fix typo

bd16bd1

Merge pull request #14 from rockerBOO/samples-attention

6597631

Samples attention

sdbds added 2 commits April 23, 2025 15:47

update for init problem

899f345

fix bugs

4fc9178

kohya-ss reviewed Jun 4, 2025

View reviewed changes

rockerBOO added 2 commits June 9, 2025 18:13

Merge branch 'sd3' into lumina

0145efc

Add lumina tests and fix image masks

d94bed6

rockerBOO mentioned this pull request Jun 10, 2025

Lumina test fix mask sdbds/sd-scripts#26

Merged

Merge pull request #26 from rockerBOO/lumina-test-fix-mask

77dbabe

Lumina test fix mask

kohya-ss reviewed Jun 12, 2025

View reviewed changes

rockerBOO and others added 3 commits June 16, 2025 16:43

Merge branch 'sd3' into update-sd3

1db7855

Revert system_prompt for dataset config

0e929f9

Merge pull request #28 from rockerBOO/lumina-train_util

8e4dc1f

Lumina train util

Uh oh!

Support Lumina-image-2.0 #1927

Are you sure you want to change the base?

Support Lumina-image-2.0 #1927

Uh oh!

Conversation

sdbds commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rockerBOO commented Feb 13, 2025

Uh oh!

sdbds commented Feb 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

envy-ai commented Feb 15, 2025

Uh oh!

rockerBOO commented Feb 15, 2025

Uh oh!

sdbds commented Feb 17, 2025

Uh oh!

kohya-ss commented Feb 19, 2025

Uh oh!

rockerBOO commented Feb 19, 2025

Uh oh!

kohya-ss commented May 25, 2025

Uh oh!

sdbds commented May 25, 2025

Uh oh!

kohya-ss commented May 25, 2025

Uh oh!

rockerBOO commented May 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kohya-ss commented May 25, 2025

Uh oh!

sdbds commented May 25, 2025

Uh oh!

kohya-ss commented May 25, 2025

Uh oh!

rockerBOO commented May 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kohya-ss left a comment

Choose a reason for hiding this comment

Uh oh!

kohya-ss Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

rockerBOO Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

This comment was marked as duplicate.

Uh oh!

kohya-ss Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

kohya-ss Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

kohya-ss Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sdbds commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kohya-ss commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sdbds commented Jun 5, 2025

Uh oh!

kohya-ss commented Jun 8, 2025

Uh oh!

rockerBOO commented Jun 8, 2025

Uh oh!

kohya-ss commented Jun 9, 2025

Uh oh!

rockerBOO commented Jun 10, 2025

Uh oh!

kohya-ss commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

sdbds commented Feb 12, 2025 •

edited

Loading

sdbds commented Feb 15, 2025 •

edited

Loading

rockerBOO commented May 25, 2025 •

edited

Loading

rockerBOO commented May 25, 2025 •

edited

Loading

kohya-ss Jun 4, 2025 •

edited

Loading

sdbds commented Jun 5, 2025 •

edited

Loading

kohya-ss commented Jun 5, 2025 •

edited

Loading

kohya-ss commented Jun 12, 2025 •

edited

Loading

sdbds commented Jun 14, 2025 •

edited

Loading

sdbds commented Jun 15, 2025 •

edited

Loading