Skip to content

[docs] quant_kwargs #11712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@
title: Training
- sections:
- local: quantization/overview
title: Getting Started
title: Getting started
- local: quantization/bitsandbytes
title: bitsandbytes
- local: quantization/gguf
Expand Down
8 changes: 4 additions & 4 deletions docs/source/en/api/quantization.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,19 +27,19 @@ Learn how to quantize models in the [Quantization](../quantization/overview) gui

## BitsAndBytesConfig

[[autodoc]] BitsAndBytesConfig
[[autodoc]] quantizers.quantization_config.BitsAndBytesConfig

## GGUFQuantizationConfig

[[autodoc]] GGUFQuantizationConfig
[[autodoc]] quantizers.quantization_config.GGUFQuantizationConfig

## QuantoConfig

[[autodoc]] QuantoConfig
[[autodoc]] quantizers.quantization_config.QuantoConfig

## TorchAoConfig

[[autodoc]] TorchAoConfig
[[autodoc]] quantizers.quantization_config.TorchAoConfig

## DiffusersQuantizer

Expand Down
20 changes: 11 additions & 9 deletions docs/source/en/quantization/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,27 +11,29 @@ specific language governing permissions and limitations under the License.

-->

# Quantization
# Getting started

Quantization focuses on representing data with fewer bits while also trying to preserve the precision of the original data. This often means converting a data type to represent the same information with fewer bits. For example, if your model weights are stored as 32-bit floating points and they're quantized to 16-bit floating points, this halves the model size which makes it easier to store and reduces memory usage. Lower precision can also speedup inference because it takes less time to perform calculations with fewer bits.

Diffusers supports multiple quantization backends to make large diffusion models like [Flux](../api/pipelines/flux) more accessible. This guide shows how to use the [`~quantizers.PipelineQuantizationConfig`] class to quantize a pipeline during its initialization from a pretrained or non-quantized checkpoint.

## Pipeline-level quantization

There are two ways you can use [`~quantizers.PipelineQuantizationConfig`] depending on the level of control you want over the quantization specifications of each model in the pipeline.
There are two ways to use [`~quantizers.PipelineQuantizationConfig`] depending on how much customization you want to apply to the quantization configuration.

- for more basic and simple use cases, you only need to define the `quant_backend`, `quant_kwargs`, and `components_to_quantize`
- for more granular quantization control, provide a `quant_mapping` that provides the quantization specifications for the individual model components
- for basic use cases, define the `quant_backend`, `quant_kwargs`, and `components_to_quantize` arguments
- for granular quantization control, define a `quant_mapping` that provides the quantization configuration for individual model components

### Simple quantization
### Basic quantization

Initialize [`~quantizers.PipelineQuantizationConfig`] with the following parameters.

- `quant_backend` specifies which quantization backend to use. Currently supported backends include: `bitsandbytes_4bit`, `bitsandbytes_8bit`, `gguf`, `quanto`, and `torchao`.
- `quant_kwargs` contains the specific quantization arguments to use.
- `quant_kwargs` specifies the quantization arguments to use. These arguments are different for each backend. Refer to the [Quantization API](../api/quantization) docs to view the arguments for each backend.
- `components_to_quantize` specifies which components of the pipeline to quantize. Typically, you should quantize the most compute intensive components like the transformer. The text encoder is another component to consider quantizing if a pipeline has more than one such as [`FluxPipeline`]. The example below quantizes the T5 text encoder in [`FluxPipeline`] while keeping the CLIP model intact.

The example below loads the bitsandbytes backend with the following arguments from [`~quantizers.quantization_config.BitsAndBytesConfig`], `load_in_4bit`, `bnb_4bit_quant_type`, and `bnb_4bit_compute_dtype`.

```py
import torch
from diffusers import DiffusionPipeline
Expand All @@ -56,13 +58,13 @@ pipe = DiffusionPipeline.from_pretrained(
image = pipe("photo of a cute dog").images[0]
```

### quant_mapping
### Advanced quantization

The `quant_mapping` argument provides more flexible options for how to quantize each individual component in a pipeline, like combining different quantization backends.
The `quant_mapping` argument provides more options for how to quantize each individual component in a pipeline, like combining different quantization backends.

Initialize [`~quantizers.PipelineQuantizationConfig`] and pass a `quant_mapping` to it. The `quant_mapping` allows you to specify the quantization options for each component in the pipeline such as the transformer and text encoder.

The example below uses two quantization backends, [`~quantizers.QuantoConfig`] and [`transformers.BitsAndBytesConfig`], for the transformer and text encoder.
The example below uses two quantization backends, [`~quantizers.quantization_config.QuantoConfig`] and [`transformers.BitsAndBytesConfig`], for the transformer and text encoder.

```py
import torch
Expand Down
Loading