Add quantize arg #23

EduardoPach · 2024-08-21T12:36:37Z

What does this PR do?

This PR adds the option to use the checkpoints from argmaxinc/mlx-FLUX.1-schnell-4bit-quantized (through model_version="FLUX.1-schnell-4bit-quantized")where the MMDiT weights were quantized to 4-bits using nn.quantize, with defaults parameters, from MLX.

Compared with the checkpoint from argmaxinc/mlx-FLUX.1-schnell the quantized checkpoint:

is 3.5x smaller (6.69gb vs 23.8bg)
achieves 5.05gb peak memory (vs 16.63gb) for 512x152

Example

from diffusionkit.mlx import FluxPipeline


quant = False
standard_version = "FLUX.1-schnell" 
quantized_version = "FLUX.1-schnell-4bit-quantized"

used_version = quantized_version if quant else standard_version

pipeline = FluxPipeline(
  model="argmaxinc/stable-diffusion",
  shift=1.0,
  model_version=used_version,
  low_memory_mode=True,
  a16=True,
  w16=True,
)

HEIGHT = 512
WIDTH = 512
NUM_STEPS = 4  #  4 for FLUX.1-schnell, 50 for SD3
CFG_WEIGHT = 0. # for FLUX.1-schnell, 5. for SD3

prompt = "A fluffy black cat with green eyes holding a sign that says 'quantization is cool!'"

image, log = pipeline.generate_image(
  prompt,
  cfg_weight=CFG_WEIGHT,
  num_steps=NUM_STEPS,
  latent_size=(HEIGHT // 8, WIDTH // 8),
  seed=42
)

Quantized

Standard

QueryType · 2024-08-21T13:40:08Z

Absolutely delightful!! Thanks, let me run on my 24GB machine

QueryType · 2024-08-21T15:44:14Z

Cool. I did 1024 x 1024!!!

diffusionkit-cli --model-version FLUX.1-schnell-4bit-quantized --steps 4 --output-path image_q.png --height 1024 --width 1024 --prompt "A fluffy black cat with green eyes holding a sign that says 'quantization is cool!'"
scikit-learn version 1.5.1 is not supported. Minimum required version: 0.17. Maximum required version: 1.1.2. Disabling scikit-learn conversion API.
Torch version 2.4.0 has not been tested with coremltools. You may run into unexpected errors. Torch 2.2.0 is the most recent version that has been tested.
WARNING:diffusionkit.mlx.scripts.generate_images:Disabling CFG for FLUX.1-schnell model.
/opt/homebrew/Caskroom/miniconda/base/envs/diffusionkit_flux/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: clean_up_tokenization_spaces was not set. It will be set to True by default. This behavior will be depracted in transformers v4.45, and will be then set to False by default. For more details check this issue: huggingface/transformers#31884
warnings.warn(
INFO:diffusionkit.mlx.scripts.generate_images:Output image resolution will be 1024x1024
INFO:diffusionkit.mlx:Pre text encoding peak memory: 0.0GB
INFO:diffusionkit.mlx:Pre text encoding active memory: 0.0GB
INFO:diffusionkit.mlx:Post text encoding peak memory: 0.964GB
INFO:diffusionkit.mlx:Post text encoding active memory: 0.476GB
INFO:diffusionkit.mlx:Text encoding time: 10.695s
INFO:diffusionkit.mlx:Pre denoise peak memory: 0.0GB
INFO:diffusionkit.mlx:Pre denoise active memory: 0.002GB
INFO:diffusionkit.mlx:Seed: 1724254698
0%| | 0/4 [00:00<?, ?it/s]INFO:diffusionkit.mlx.mmdit:Cached modulation_params for timesteps=array([1000, 752, 500, 250, 0], dtype=bfloat16)
INFO:diffusionkit.mlx.mmdit:Cached modulation_params will reduce peak memory by 1.6 GB
100%|████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [03:06<00:00, 46.51s/it]
INFO:diffusionkit.mlx:Post denoise peak memory: 7.472GB
INFO:diffusionkit.mlx:Post denoise active memory: 4.554GB
INFO:diffusionkit.mlx:Denoising time: 186.571s
INFO:diffusionkit.mlx:Pre decode peak memory: 0.0GB
INFO:diffusionkit.mlx:Pre decode active memory: 0.003GB
INFO:diffusionkit.mlx:Post decode peak memory: 10.08GB
INFO:diffusionkit.mlx:Post decode active memory: 0.101GB
INFO:diffusionkit.mlx:============= Summary =============
INFO:diffusionkit.mlx:Text encoder: 10.7s
INFO:diffusionkit.mlx:Denoising: 186.6s
INFO:diffusionkit.mlx:Image decoder: 5.2s
INFO:diffusionkit.mlx:Peak memory: 10.1GB
INFO:diffusionkit.mlx:============= Inference Context =============
INFO:diffusionkit.mlx:Operating System:
{'os_build_number': '23G93', 'os_type': 'macOS', 'os_version': '14.6.1'}
INFO:diffusionkit.mlx:Device:
{'cpu_core_count': 8,
'gpu_core_count': 10,
'max_ram': '25032146944',
'product_name': 'Apple M2'}
INFO:diffusionkit.mlx:Total time: 202.913s
INFO:diffusionkit.mlx.scripts.generate_images:Saved the image to image_q.png

okaneyo · 2024-08-22T03:58:04Z

3 mins for 1 image isn't Gucci.

EduardoPach · 2024-08-22T09:07:02Z

3 mins for 1 image isn't Gucci.

What is your macbook spec and which resolution did you tried?

EduardoPach added 2 commits August 21, 2024 12:06

Add new 4-bit checkpoint for FLUX

405f49c

Fixed 4-bit to work with CLI

4e778a0

atiorh merged commit 13c2a05 into argmaxinc:main Aug 21, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add quantize arg #23

Add quantize arg #23

Uh oh!

EduardoPach commented Aug 21, 2024

Uh oh!

QueryType commented Aug 21, 2024

Uh oh!

Uh oh!

QueryType commented Aug 21, 2024

Uh oh!

okaneyo commented Aug 22, 2024

Uh oh!

EduardoPach commented Aug 22, 2024

Uh oh!

Uh oh!

Add quantize arg #23

Add quantize arg #23

Uh oh!

Conversation

EduardoPach commented Aug 21, 2024

What does this PR do?

Example

Quantized

Standard

Uh oh!

QueryType commented Aug 21, 2024

Uh oh!

Uh oh!

QueryType commented Aug 21, 2024

Uh oh!

okaneyo commented Aug 22, 2024

Uh oh!

EduardoPach commented Aug 22, 2024

Uh oh!

Uh oh!