[docs] Quantization + torch.compile + offloading #11703

stevhliu · 2025-06-12T22:55:00Z

Follows up on #11670 and #11672 to document combinations of quantization, torch.compile, and offloading.

HuggingFaceDocBuilderDev · 2025-06-12T23:01:52Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

stevhliu · 2025-06-12T23:12:04Z

docs/source/en/optimization/speed-memory-optims.md

+
+Refer to the table below for the latency and memory-usage of each combination.
+
+| combination | latency | memory usage |


@sayakpaul, can you help with the numbers here?

sayakpaul

Thanks for starting this. Will get you the numbers.

docs/source/en/optimization/speed-memory-optims.md

sayakpaul · 2025-06-14T02:01:12Z

@stevhliu

combination	latency	memory usage
quantization	32.602	14.9453
quantization, torch.compile	25.847	14.9448
quantization, torch.compile, model CPU offloading	32.312	12.2369
quantization, torch.compile, group offloading	60.235	12.2369

Code: https://gist.github.com/sayakpaul/0db9d8eeeb3d2a0e5ed7cf0d9ca19b7d

Worth mentioning:

We are applying quantization to transformer and text_encoder_2
GPU used: RTX 4090
Using PyTorch nightlies is better
https://huggingface.co/docs/diffusers/main/en/optimization/memory#group-offloading mentions why the speed-memory trade-off with group offloading in Flux isn't as expected.

stevhliu commented Jun 12, 2025

View reviewed changes

stevhliu requested a review from sayakpaul June 12, 2025 23:12

sayakpaul reviewed Jun 13, 2025

View reviewed changes

docs/source/en/optimization/speed-memory-optims.md Outdated Show resolved Hide resolved

docs/source/en/optimization/speed-memory-optims.md Outdated Show resolved Hide resolved

docs/source/en/optimization/speed-memory-optims.md Outdated Show resolved Hide resolved

sayakpaul reviewed Jun 13, 2025

View reviewed changes

docs/source/en/optimization/speed-memory-optims.md Outdated Show resolved Hide resolved

stevhliu added 2 commits June 13, 2025 14:21

draft

3102eb2

feedback

7d7f274

stevhliu force-pushed the combine-optims branch from cdfd845 to 7d7f274 Compare June 13, 2025 22:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[docs] Quantization + torch.compile + offloading #11703

[docs] Quantization + torch.compile + offloading #11703

Uh oh!

stevhliu commented Jun 12, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jun 12, 2025

Uh oh!

stevhliu Jun 12, 2025

Uh oh!

sayakpaul left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sayakpaul commented Jun 14, 2025

Uh oh!

Uh oh!


		Refer to the table below for the latency and memory-usage of each combination.

		\| combination \| latency \| memory usage \|

[docs] Quantization + torch.compile + offloading #11703

Are you sure you want to change the base?

[docs] Quantization + torch.compile + offloading #11703

Uh oh!

Conversation

stevhliu commented Jun 12, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jun 12, 2025

Uh oh!

stevhliu Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sayakpaul commented Jun 14, 2025

Uh oh!

Uh oh!