Skip to content

Add quantize arg #23

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 21, 2024
Merged

Add quantize arg #23

merged 2 commits into from
Aug 21, 2024

Conversation

EduardoPach
Copy link
Contributor

What does this PR do?

This PR adds the option to use the checkpoints from argmaxinc/mlx-FLUX.1-schnell-4bit-quantized (through model_version="FLUX.1-schnell-4bit-quantized")where the MMDiT weights were quantized to 4-bits using nn.quantize, with defaults parameters, from MLX.

Compared with the checkpoint from argmaxinc/mlx-FLUX.1-schnell the quantized checkpoint:

  • is 3.5x smaller (6.69gb vs 23.8bg)
  • achieves 5.05gb peak memory (vs 16.63gb) for 512x152

Example

from diffusionkit.mlx import FluxPipeline


quant = False
standard_version = "FLUX.1-schnell" 
quantized_version = "FLUX.1-schnell-4bit-quantized"

used_version = quantized_version if quant else standard_version

pipeline = FluxPipeline(
  model="argmaxinc/stable-diffusion",
  shift=1.0,
  model_version=used_version,
  low_memory_mode=True,
  a16=True,
  w16=True,
)

HEIGHT = 512
WIDTH = 512
NUM_STEPS = 4  #  4 for FLUX.1-schnell, 50 for SD3
CFG_WEIGHT = 0. # for FLUX.1-schnell, 5. for SD3

prompt = "A fluffy black cat with green eyes holding a sign that says 'quantization is cool!'"

image, log = pipeline.generate_image(
  prompt,
  cfg_weight=CFG_WEIGHT,
  num_steps=NUM_STEPS,
  latent_size=(HEIGHT // 8, WIDTH // 8),
  seed=42
)

Quantized

image_FLUX 1-schnell-4bit-quantized

Standard

image_FLUX 1-schnell

@QueryType
Copy link

Absolutely delightful!! Thanks, let me run on my 24GB machine

@atiorh atiorh merged commit 13c2a05 into argmaxinc:main Aug 21, 2024
1 check passed
@QueryType
Copy link

Cool. I did 1024 x 1024!!!
image_q

diffusionkit-cli --model-version FLUX.1-schnell-4bit-quantized --steps 4 --output-path image_q.png --height 1024 --width 1024 --prompt "A fluffy black cat with green eyes holding a sign that says 'quantization is cool!'"
scikit-learn version 1.5.1 is not supported. Minimum required version: 0.17. Maximum required version: 1.1.2. Disabling scikit-learn conversion API.
Torch version 2.4.0 has not been tested with coremltools. You may run into unexpected errors. Torch 2.2.0 is the most recent version that has been tested.
WARNING:diffusionkit.mlx.scripts.generate_images:Disabling CFG for FLUX.1-schnell model.
/opt/homebrew/Caskroom/miniconda/base/envs/diffusionkit_flux/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: clean_up_tokenization_spaces was not set. It will be set to True by default. This behavior will be depracted in transformers v4.45, and will be then set to False by default. For more details check this issue: huggingface/transformers#31884
warnings.warn(
INFO:diffusionkit.mlx.scripts.generate_images:Output image resolution will be 1024x1024
INFO:diffusionkit.mlx:Pre text encoding peak memory: 0.0GB
INFO:diffusionkit.mlx:Pre text encoding active memory: 0.0GB
INFO:diffusionkit.mlx:Post text encoding peak memory: 0.964GB
INFO:diffusionkit.mlx:Post text encoding active memory: 0.476GB
INFO:diffusionkit.mlx:Text encoding time: 10.695s
INFO:diffusionkit.mlx:Pre denoise peak memory: 0.0GB
INFO:diffusionkit.mlx:Pre denoise active memory: 0.002GB
INFO:diffusionkit.mlx:Seed: 1724254698
0%| | 0/4 [00:00<?, ?it/s]INFO:diffusionkit.mlx.mmdit:Cached modulation_params for timesteps=array([1000, 752, 500, 250, 0], dtype=bfloat16)
INFO:diffusionkit.mlx.mmdit:Cached modulation_params will reduce peak memory by 1.6 GB
100%|████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [03:06<00:00, 46.51s/it]
INFO:diffusionkit.mlx:Post denoise peak memory: 7.472GB
INFO:diffusionkit.mlx:Post denoise active memory: 4.554GB
INFO:diffusionkit.mlx:Denoising time: 186.571s
INFO:diffusionkit.mlx:Pre decode peak memory: 0.0GB
INFO:diffusionkit.mlx:Pre decode active memory: 0.003GB
INFO:diffusionkit.mlx:Post decode peak memory: 10.08GB
INFO:diffusionkit.mlx:Post decode active memory: 0.101GB
INFO:diffusionkit.mlx:============= Summary =============
INFO:diffusionkit.mlx:Text encoder: 10.7s
INFO:diffusionkit.mlx:Denoising: 186.6s
INFO:diffusionkit.mlx:Image decoder: 5.2s
INFO:diffusionkit.mlx:Peak memory: 10.1GB
INFO:diffusionkit.mlx:============= Inference Context =============
INFO:diffusionkit.mlx:Operating System:
{'os_build_number': '23G93', 'os_type': 'macOS', 'os_version': '14.6.1'}
INFO:diffusionkit.mlx:Device:
{'cpu_core_count': 8,
'gpu_core_count': 10,
'max_ram': '25032146944',
'product_name': 'Apple M2'}
INFO:diffusionkit.mlx:Total time: 202.913s
INFO:diffusionkit.mlx.scripts.generate_images:Saved the image to image_q.png

@okaneyo
Copy link

okaneyo commented Aug 22, 2024

3 mins for 1 image isn't Gucci.

@EduardoPach
Copy link
Contributor Author

3 mins for 1 image isn't Gucci.

What is your macbook spec and which resolution did you tried?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants