From fc679b19e35db4f20e3d03cab1be4df9fa4ba581 Mon Sep 17 00:00:00 2001
From: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Date: Mon, 19 May 2025 16:39:18 -0400
Subject: [PATCH 1/7] Start cleaning up docs

---
 docs/source/_toctree.yml     |  11 +-
 docs/source/algorithms.mdx   |  12 --
 docs/source/installation.mdx | 212 ++++++++++-------------------------
 3 files changed, 63 insertions(+), 172 deletions(-)
 delete mode 100644 docs/source/algorithms.mdx

diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
index 5fa353d6d..ddf409d5c 100644
--- a/docs/source/_toctree.yml
+++ b/docs/source/_toctree.yml
@@ -2,18 +2,15 @@
   sections:
   - local: index
     title: bitsandbytes
-  - local: quickstart
-    title: Quickstart
   - local: installation
     title: Installation
-- title: Guides
+  - local: quickstart
+    title: Quickstart
+
+- title: Usage Guides
   sections:
   - local: optimizers
     title: 8-bit optimizers
-  - local: algorithms
-    title: Algorithms
-  - local: non_cuda_backends
-    title: Non-CUDA compute backends
   - local: fsdp_qlora
     title: FSDP-QLoRA
   - local: integrations
diff --git a/docs/source/algorithms.mdx b/docs/source/algorithms.mdx
deleted file mode 100644
index 65e5567a4..000000000
--- a/docs/source/algorithms.mdx
+++ /dev/null
@@ -1,12 +0,0 @@
-# Other algorithms
-_WIP: Still incomplete... Community contributions would be greatly welcome!_
-
-This is an overview of the `bnb.functional` API in `bitsandbytes` that we think would also be useful as standalone entities.
-
-## Using Int8 Matrix Multiplication
-
-For straight Int8 matrix multiplication without mixed precision decomposition you can use ``bnb.matmul(...)``. To enable mixed precision decomposition, use the threshold parameter:
-
-```py
-bnb.matmul(..., threshold=6.0)
-```
diff --git a/docs/source/installation.mdx b/docs/source/installation.mdx
index 704d7aacc..531fd447e 100644
--- a/docs/source/installation.mdx
+++ b/docs/source/installation.mdx
@@ -1,6 +1,6 @@
 # Installation Guide
 
-Welcome to the installation guide for the `bitsandbytes` library! This document provides step-by-step instructions to install `bitsandbytes` across various platforms and hardware configurations. The library primarily supports CUDA-based GPUs, but the team is actively working on enabling support for additional backends like AMD ROCm, Intel, and Apple Silicon.
+Welcome to the installation guide for the `bitsandbytes` library! This document provides step-by-step instructions to install `bitsandbytes` across various platforms and hardware configurations. The library primarily supports CUDA-based GPUs, but the team is actively working on enabling support for additional backends like AMD ROCm and Intel XPU.
 
 > [!TIP]
 > For a high-level overview of backend support and compatibility, see the [Multi-backend Support](#multi-backend) section.
@@ -10,82 +10,57 @@ Welcome to the installation guide for the `bitsandbytes` library! This document
 - [CUDA](#cuda)
   - [Installation via PyPI](#cuda-pip)
   - [Compile from Source](#cuda-compile)
-- [Multi-backend Support (Alpha Release)](#multi-backend)
+  - [Preview Wheels from `main`](#cuda-preview)
+- [Deprecated: Multi-Backend Preview](#multi-backend)
   - [Supported Backends](#multi-backend-supported-backends)
   - [Pre-requisites](#multi-backend-pre-requisites)
   - [Installation](#multi-backend-pip)
   - [Compile from Source](#multi-backend-compile)
-- [PyTorch CUDA Versions](#pytorch-cuda-versions)
 
 ## CUDA[[cuda]]
 
-`bitsandbytes` is currently only supported on CUDA GPUs for CUDA versions **11.0 - 12.8**. However, there's an ongoing multi-backend effort under development, which is currently in alpha. If you're interested in providing feedback or testing, check out [the multi-backend section below](#multi-backend).
+`bitsandbytes` is currently supported on NVIDIA GPUs with [Compute Capability](https://developer.nvidia.com/cuda-gpus) 5.0+.
+The library can be built using CUDA Toolkit versions as old as **11.6** on Windows and **11.4** on Linux.
 
-### Supported CUDA Configurations[[cuda-pip]]
-
-The latest version of the distributed `bitsandbytes` package is built with the following configurations:
-
-| **OS**      | **CUDA Toolkit** | **Host Compiler**         |
-|-------------|------------------|----------------------|
-| **Linux**   | 11.8 - 12.3      | GCC 11.4             |
-|             | 12.4 - 12.8      | GCC 13.2             |
-| **Windows** | 11.8 - 12.8      | MSVC 19.42+ (VS2022) |
-
-For CUDA systems, ensure your hardware meets the following requirements:
-
-| **Feature**                     | **Minimum Hardware Requirement**                              |
+| **Feature**                     | **CC Required** | **Example Hardware Requirement**            |
 |---------------------------------|---------------------------------------------------------------|
-| LLM.int8()                      | NVIDIA Turing (RTX 20 series, T4) or newer GPUs               |
-| 8-bit optimizers/quantization   | NVIDIA Maxwell (GTX 900 series, TITAN X, M40) or newer GPUs * |
-| NF4/FP4 quantization            | NVIDIA Maxwell (GTX 900 series, TITAN X, M40) or newer GPUs * |
+| LLM.int8()                      | 7.5+ | Turing (RTX 20 series, T4) or newer GPUs               |
+| 8-bit optimizers/quantization   | 5.0+ | Maxwell (GTX 900 series, TITAN X, M40) or newer GPUs * |
+| NF4/FP4 quantization            | 5.0+ | Maxwell (GTX 900 series, TITAN X, M40) or newer GPUs * |
 
 > [!WARNING]
-> `bitsandbytes >= 0.45.0` no longer supports Kepler GPUs.
->
 > Support for Maxwell GPUs is deprecated and will be removed in a future release. For the best results, a Turing generation device or newer is recommended.
 
-```bash
-pip install bitsandbytes
-```
-
-### `pip install` pre-built wheel from latest `main` commit
-
-If you would like to use new feature even before they are officially released and help us test them, feel free to install the wheel directly from our CI (*the wheel links will remain stable!*):
-
-<hfoptions id="OS">
-<hfoption id="Linux">
+### Installation via PyPI[[cuda-pip]]
 
-```
-# Note, if you don't want to reinstall BNBs dependencies, append the `--no-deps` flag!
+This is the most straightforward and recommended installation option.
 
-# x86_64 (most users)
-pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_x86_64.whl
+The currently distributed `bitsandbytes` packages are built with the following configurations:
 
-# ARM/aarch64
-pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_aarch64.whl
-```
+| **OS**             | **CUDA Toolkit** | **Host Compiler**    | **Targets**
+|--------------------|------------------|----------------------|--------------
+| **Linux x86-64**   | 11.8 - 12.6      | GCC 11.2             | sm50, sm60, sm75, sm80, sm86, sm89, sm90, sm100, sm120
+| **Linux x86-64**   | 12.8             | GCC 11.2             | sm75, sm80, sm86, sm89, sm90, sm100, sm120
+| **Linux aarch64**  | 11.8 - 12.8      | GCC 11.2             | sm75, sm80, sm90, sm100
+| **Windows x86-64** | 11.8 - 12.8      | MSVC 19.43+ (VS2022) | sm50, sm60, sm75, sm80, sm86, sm89, sm90, sm100, sm120
 
-</hfoption>
-<hfoption id="Windows">
+Use `pip` or `uv` to install:
 
+```bash
+pip install bitsandbytes
 ```
-# Note, if you don't want to reinstall BNBs dependencies, append the `--no-deps` flag!
-pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-win_amd64.whl
-```
-</hfoption>
-</hfoptions>
 
 ### Compile from source[[cuda-compile]]
 
 > [!TIP]
-> Don't hesitate to compile from source! The process is pretty straight forward and resilient. This might be needed for older CUDA versions or other less common configurations, which we don't support out of the box due to package size.
+> Don't hesitate to compile from source! The process is pretty straight forward and resilient. This might be needed for older CUDA Toolkit versions or Linux distributions, or other less common configurations/
 
 For Linux and Windows systems, compiling from source allows you to customize the build configurations. See below for detailed platform-specific instructions (see the `CMakeLists.txt` if you want to check the specifics and explore some additional options):
 
 <hfoptions id="source">
 <hfoption id="Linux">
 
-To compile from source, you need CMake >= **3.22.1** and Python >= **3.9** installed. Make sure you have a compiler installed to compile C++ (`gcc`, `make`, headers, etc.).
+To compile from source, you need CMake >= **3.22.1** and Python >= **3.9** installed. Make sure you have a compiler installed to compile C++ (`gcc`, `make`, headers, etc.). It is recommended to use GCC 9 or newer.
 
 For example, to install a compiler and CMake on Ubuntu:
 
@@ -93,7 +68,7 @@ For example, to install a compiler and CMake on Ubuntu:
 apt-get install -y build-essential cmake
 ```
 
-You should also install CUDA Toolkit by following the [NVIDIA CUDA Installation Guide for Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) guide from NVIDIA. The current minimum supported CUDA Toolkit version is **11.8**.
+You should also install CUDA Toolkit by following the [NVIDIA CUDA Installation Guide for Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) guide from NVIDIA. The current minimum supported CUDA Toolkit version that we test with is **11.8**.
 
 ```bash
 git clone https://github.com/bitsandbytes-foundation/bitsandbytes.git && cd bitsandbytes/
@@ -110,7 +85,7 @@ pip install -e .   # `-e` for "editable" install, when developing BNB (otherwise
 
 Windows systems require Visual Studio with C++ support as well as an installation of the CUDA SDK.
 
-To compile from source, you need CMake >= **3.22.1** and Python >= **3.9** installed. You should also install CUDA Toolkit by following the [CUDA Installation Guide for Windows](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html) guide from NVIDIA. The current minimum supported CUDA Toolkit version is **11.8**.
+To compile from source, you need CMake >= **3.22.1** and Python >= **3.9** installed. You should also install CUDA Toolkit by following the [CUDA Installation Guide for Windows](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html) guide from NVIDIA. The current minimum supported CUDA Toolkit version that we test with is **11.8**.
 
 ```bash
 git clone https://github.com/bitsandbytes-foundation/bitsandbytes.git && cd bitsandbytes/
@@ -124,78 +99,46 @@ Big thanks to [wkpark](https://github.com/wkpark), [Jamezo97](https://github.com
 </hfoption>
 </hfoptions>
 
-### PyTorch CUDA versions[[pytorch-cuda-versions]]
-
-Some bitsandbytes features may need a newer CUDA version than the one currently supported by PyTorch binaries from Conda and pip. In this case, you should follow these instructions to load a precompiled bitsandbytes binary.
+### Preview Wheels from `main`[[cuda-preview]]
 
-1. Determine the path of the CUDA version you want to use. Common paths include:
+If you would like to use new features even before they are officially released and help us test them, feel free to install the wheel directly from our CI (*the wheel links will remain stable!*):
 
-* `/usr/local/cuda`
-* `/usr/local/cuda-XX.X` where `XX.X` is the CUDA version number
-
-Then locally install the CUDA version you need with this script from bitsandbytes:
-
-```bash
-wget https://raw.githubusercontent.com/bitsandbytes-foundation/bitsandbytes/main/install_cuda.sh
-# Syntax cuda_install CUDA_VERSION INSTALL_PREFIX EXPORT_TO_BASH
-#   CUDA_VERSION in {118, 120, 121, 122, 123, 124, 125, 126, 128}
-#   EXPORT_TO_BASH in {0, 1} with 0=False and 1=True
-
-# For example, the following installs CUDA 12.6 to ~/local/cuda-12.6 and exports the path to your .bashrc
+<hfoptions id="OS">
+<hfoption id="Linux">
 
-bash install_cuda.sh 126 ~/local 1
 ```
+# Note: if you don't want to reinstall our dependencies, append the `--no-deps` flag!
 
-2. Set the environment variables `BNB_CUDA_VERSION` and `LD_LIBRARY_PATH` by manually overriding the CUDA version installed by PyTorch.
-
-> [!TIP]
-> It is recommended to add the following lines to the `.bashrc` file to make them permanent.
+# x86_64 (most users)
+pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_x86_64.whl
 
-```bash
-export BNB_CUDA_VERSION=<VERSION>
-export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<PATH>
+# ARM/aarch64
+pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_aarch64.whl
 ```
 
-For example, to use a local install path:
+</hfoption>
+<hfoption id="Windows">
 
-```bash
-export BNB_CUDA_VERSION=126
-export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/YOUR_USERNAME/local/cuda-12.6
 ```
-
-3. Now when you launch bitsandbytes with these environment variables, the PyTorch CUDA version is overridden by the new CUDA version (in this example, version 12.6) and a different bitsandbytes library is loaded.
-
-## Multi-backend Support (Alpha Release)[[multi-backend]]
-
-> [!TIP]
-> This functionality is currently in preview and not yet production-ready. We very much welcome community feedback, contributions and leadership on topics like Apple Silicon as well as other less common accellerators! For more information, see [this guide on multi-backend support](./non_cuda_backends).
-
-**Link to give us feedback** (bugs, install issues, perf results, requests, etc.)**:**
-
-<hfoptions id="platform">
-<hfoption id="ROCm">
-
-[**Multi-backend refactor: Alpha release (AMD ROCm ONLY)**](https://github.com/bitsandbytes-foundation/bitsandbytes/discussions/1339)
-
+# Note: if you don't want to reinstall our dependencies, append the `--no-deps` flag!
+pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-win_amd64.whl
+```
 </hfoption>
-<hfoption id="Intel CPU+GPU">
+</hfoptions>
 
-[**Multi-backend refactor: Alpha release (INTEL ONLY)**](https://github.com/bitsandbytes-foundation/bitsandbytes/discussions/1338)
 
-</hfoption>
-<hfoption id="Apple Silicon / Metal (MPS)">
+## Deprecated: Multi-Backend Preview[[multi-backend]]
 
-[**Github Discussion space on coordinating the kickoff of MPS backend development**](https://github.com/bitsandbytes-foundation/bitsandbytes/discussions/1340)
+> [!TIP]
+> This functionality existed as an early technical preview and is not recommended for production use. We are in the process of upstreaming improved support for AMD and Intel hardware into the main project.
 
-</hfoption>
-</hfoptions>
+We provide an early preview of support for AMD and Intel hardware as part of a development branch.
 
 ### Supported Backends[[multi-backend-supported-backends]]
 
 | **Backend** | **Supported Versions** | **Python versions** | **Architecture Support** | **Status** |
 |-------------|------------------------|---------------------------|-------------------------|------------|
 | **AMD ROCm** | 6.1+                   | 3.10+                     | minimum CDNA - `gfx90a`, RDNA - `gfx1100` | Alpha      |
-| **Apple Silicon (MPS)** | WIP                        | 3.10+                     | M1/M2 chips                    | Planned    |
 | **Intel CPU** | v2.4.0+ (`ipex`)         | 3.10+                     | Intel CPU | Alpha |
 | **Intel GPU** | v2.4.0+ (`ipex`)         | 3.10+                     | Intel GPU | Experimental |
 | **Ascend NPU** | 2.1.0+ (`torch_npu`)         | 3.10+                     | Ascend NPU | Experimental |
@@ -204,7 +147,7 @@ For each supported backend, follow the respective instructions below:
 
 ### Pre-requisites[[multi-backend-pre-requisites]]
 
-To use bitsandbytes non-CUDA backends, be sure to install:
+To use this preview version of `bitsandbytes` with `transformers`, be sure to install:
 
 ```
 pip install "transformers>=4.45.1"
@@ -218,33 +161,26 @@ pip install "transformers>=4.45.1"
 >
 > Other supported versions that don't come with pre-compiled binaries [can be compiled for with these instructions](#multi-backend-compile).
 >
-> **Windows is not supported for the ROCm backend**; also not WSL2 to our knowledge.
+> **Windows is not supported for the ROCm backend**
 
 > [!TIP]
 > If you would like to install ROCm and PyTorch on bare metal, skip the Docker steps and refer to ROCm's official guides at [ROCm installation overview](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/install-overview.html#rocm-install-overview) and [Installing PyTorch for ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/3rd-party/pytorch-install.html#using-wheels-package) (Step 3 of wheels build for quick installation). Special note: please make sure to get the respective ROCm-specific PyTorch wheel for the installed ROCm version, e.g. `https://download.pytorch.org/whl/nightly/rocm6.2/`!
 
 ```bash
-# Create a docker container with latest ROCm image, which includes ROCm libraries
-docker pull rocm/dev-ubuntu-22.04:6.1.2-complete
-docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/dev-ubuntu-22.04:6.1.2-complete
+# Create a docker container with the ROCm image, which includes ROCm libraries
+docker pull rocm/dev-ubuntu-22.04:6.3.4-complete
+docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/dev-ubuntu-22.04:6.3.4-complete
 apt-get update && apt-get install -y git && cd home
 
 # Install pytorch compatible with above ROCm version
-pip install torch --index-url https://download.pytorch.org/whl/rocm6.1/
+pip install torch --index-url https://download.pytorch.org/whl/rocm6.3/
 ```
 
 </hfoption>
-<hfoption id="Intel CPU + GPU">
+<hfoption id="Intel XPU">
 
-Compatible hardware and functioning `import intel_extension_for_pytorch as ipex` capable environment with Python `3.10` as the minimum requirement.
-
-Please refer to [the official Intel installations instructions](https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=cpu&version=v2.4.0%2bcpu&os=linux%2fwsl2) for guidance on how to pip install the necessary `intel_extension_for_pytorch` dependency.
-
-</hfoption>
-<hfoption id="Apple Silicon (MPS)">
-
-> [!TIP]
-> Apple Silicon support is still a WIP. Please visit and write us in [this Github Discussion space on coordinating the kickoff of MPS backend development](https://github.com/bitsandbytes-foundation/bitsandbytes/discussions/1340) and coordinate a community-led effort to implement this backend.
+* A compatible PyTorch version with Intel XPU support is required. It is recommended to use the latest stable release. See [Getting Started on Intel GPU](https://docs.pytorch.org/docs/stable/notes/get_start_xpu.html) for guidance.
+* The [Intel Extension for PyTorch](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/) is recommended for performance improvements.
 
 </hfoption>
 </hfoptions>
@@ -257,38 +193,22 @@ You can install the pre-built wheels for each backend, or compile from source fo
 
 <hfoptions id="platform">
 <hfoption id="Linux">
+This wheel provides support for ROCm and Intel XPU platforms.
 
 ```
-# Note, if you don't want to reinstall BNBs dependencies, append the `--no-deps` flag!
+# Note, if you don't want to reinstall our dependencies, append the `--no-deps` flag!
 pip install --force-reinstall 'https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_multi-backend-refactor/bitsandbytes-0.44.1.dev0-py3-none-manylinux_2_24_x86_64.whl'
 ```
 
 </hfoption>
 <hfoption id="Windows">
+This wheel provides support for the Intel XPU platform.
 
 ```
-# Note, if you don't want to reinstall BNBs dependencies, append the `--no-deps` flag!
+# Note, if you don't want to reinstall our dependencies, append the `--no-deps` flag!
 pip install --force-reinstall 'https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_multi-backend-refactor/bitsandbytes-0.44.1.dev0-py3-none-win_amd64.whl'
 ```
 
-</hfoption>
-<hfoption id="Ascend NPU">
-
-Compatible hardware and functioning `import torch_npu` capable environment with Python `3.10` as the minimum requirement.
-
-Please refer to [the official Ascend installations instructions](https://www.hiascend.com/document/detail/zh/Pytorch/60RC3/configandinstg/instg/insg_0001.html) for guidance on how to pip install the necessary `torch_npu` dependency.
-
-</hfoption>
-<hfoption id="Mac">
-
-> [!WARNING]
-> bitsandbytes does not yet support Apple Silicon / Metal with a dedicated backend. However, the build infrastructure is in place and the below pip install will eventually provide Apple Silicon support as it becomes available on the `multi-backend-refactor` branch based on community contributions.
-
-```
-# Note, if you don't want to reinstall BNBs dependencies, append the `--no-deps` flag!
-pip install --force-reinstall 'https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_multi-backend-refactor/bitsandbytes-0.44.1.dev0-py3-none-macosx_13_1_arm64.whl'
-```
-
 </hfoption>
 </hfoptions>
 
@@ -299,7 +219,7 @@ pip install --force-reinstall 'https://github.com/bitsandbytes-foundation/bitsan
 
 #### AMD GPU
 
-bitsandbytes is fully supported from ROCm 6.1 onwards (currently in alpha release).
+bitsandbytes is supported from ROCm 6.1 - ROCm 6.4.
 
 ```bash
 # Install bitsandbytes from source
@@ -318,8 +238,6 @@ pip install -e .   # `-e` for "editable" install, when developing BNB (otherwise
 
 #### Intel CPU + XPU
 
-> [!TIP]
-> Intel CPU/XPU backend only supports building from source; for now, please follow the instructions below.
 
 It does not need compile CPP codes, all required ops are in [intel_extension_for_pytorch](https://pytorch-extension.intel.com/), please follow the instruction to install ipex.
 
@@ -330,15 +248,12 @@ pip install intel_extension_for_pytorch
 git clone --depth 1 -b multi-backend-refactor https://github.com/bitsandbytes-foundation/bitsandbytes.git && cd bitsandbytes/
 pip install -e .   # `-e` for "editable" install, when developing BNB (otherwise leave that out)
 ```
-
 </hfoption>
 <hfoption id="Ascend NPU">
 
 #### Ascend NPU
 
-> [!TIP]
-> Ascend NPU backend only supports building from source; for now, please follow the instructions below.
-
+Please refer to [the official Ascend installations instructions](https://www.hiascend.com/document/detail/zh/Pytorch/60RC3/configandinstg/instg/insg_0001.html) for guidance on how to install the necessary `torch_npu` dependency.
 
 ```
 # Install bitsandbytes from source
@@ -351,14 +266,5 @@ cmake -DCOMPUTE_BACKEND=npu -S .
 make
 pip install -e .   # `-e` for "editable" install, when developing BNB (otherwise leave that out)
 ```
-
-
-</hfoption>
-<hfoption id="Apple Silicon (MPS)">
-
-#### Apple Silicon
-
-WIP
-
 </hfoption>
 </hfoptions>

From 0a35a1d0c40a78e1d1c1a81e28e4be7d571049b5 Mon Sep 17 00:00:00 2001
From: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Date: Mon, 19 May 2025 16:44:55 -0400
Subject: [PATCH 2/7] Remove page

---
 docs/source/non_cuda_backends.mdx | 44 -------------------------------
 1 file changed, 44 deletions(-)
 delete mode 100644 docs/source/non_cuda_backends.mdx

diff --git a/docs/source/non_cuda_backends.mdx b/docs/source/non_cuda_backends.mdx
deleted file mode 100644
index 728606b7b..000000000
--- a/docs/source/non_cuda_backends.mdx
+++ /dev/null
@@ -1,44 +0,0 @@
-# Multi-backend support (non-CUDA backends)
-
-> [!Tip]
-> If you feel these docs need some additional info, please consider submitting a PR or respectfully request the missing info in one of the below mentioned Github discussion spaces.
-
-As part of a recent refactoring effort, we will soon offer official multi-backend support. Currently, this feature is available in a preview alpha release, allowing us to gather early feedback from users to improve the functionality and identify any bugs.
-
-At present, the Intel CPU and AMD ROCm backends are considered fully functional. The Intel XPU backend has limited functionality and is less mature.
-
-Please refer to the [installation instructions](./installation#multi-backend) for details on installing the backend you intend to test (and hopefully provide feedback on).
-
-> [!Tip]
-> Apple Silicon support is planned for Q4 2024. We are actively seeking contributors to help implement this, develop a concrete plan, and create a detailed list of requirements. Due to limited resources, we rely on community contributions for this implementation effort. To discuss further, please spell out your thoughts and discuss in [this GitHub discussion](https://github.com/bitsandbytes-foundation/bitsandbytes/discussions/1340) and tag `@Titus-von-Koeller` and `@matthewdouglas`. Thank you!
-
-## Alpha Release
-
-As we are currently in the alpha testing phase, bugs are expected, and performance might not meet expectations. However, this is exactly what we want to discover from **your** perspective as the end user!
-
-Please share and discuss your feedback with us here:
-
-- [Github Discussion: Multi-backend refactor: Alpha release ( AMD ROCm ONLY )](https://github.com/bitsandbytes-foundation/bitsandbytes/discussions/1339)
-- [Github Discussion: Multi-backend refactor: Alpha release ( Intel ONLY )](https://github.com/bitsandbytes-foundation/bitsandbytes/discussions/1338)
-
-Thank you for your support!
-
-## Benchmarks
-
-### Intel
-
-The following performance data is collected from Intel 4th Gen Xeon (SPR) platform. The tables show speed-up and memory compared with different data types of [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf).
-
-#### Inference (CPU)
-
-| Data Type | BF16 | INT8 | NF4 | FP4 |
-|---|---|---|---|---|
-| Speed-Up (vs BF16) | 1.0x | 0.6x | 2.3x | 0.03x |
-| Memory (GB) | 13.1 | 7.6 | 5.0 | 4.6 |
-
-#### Fine-Tuning (CPU)
-
-| Data Type | AMP BF16 | INT8 | NF4 | FP4 |
-|---|---|---|---|---|
-| Speed-Up (vs AMP BF16) | 1.0x | 0.38x | 0.07x | 0.07x |
-| Memory (GB) | 40 | 9 | 6.6 | 6.6 |

From 228063c8c8de0b41b8bd92ddcb5542a05cc80086 Mon Sep 17 00:00:00 2001
From: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Date: Mon, 19 May 2025 17:00:17 -0400
Subject: [PATCH 3/7] Minor update

---
 docs/source/installation.mdx | 33 +++++++++++++++++----------------
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/docs/source/installation.mdx b/docs/source/installation.mdx
index 531fd447e..73e2e7ca4 100644
--- a/docs/source/installation.mdx
+++ b/docs/source/installation.mdx
@@ -23,10 +23,10 @@ Welcome to the installation guide for the `bitsandbytes` library! This document
 The library can be built using CUDA Toolkit versions as old as **11.6** on Windows and **11.4** on Linux.
 
 | **Feature**                     | **CC Required** | **Example Hardware Requirement**            |
-|---------------------------------|---------------------------------------------------------------|
-| LLM.int8()                      | 7.5+ | Turing (RTX 20 series, T4) or newer GPUs               |
-| 8-bit optimizers/quantization   | 5.0+ | Maxwell (GTX 900 series, TITAN X, M40) or newer GPUs * |
-| NF4/FP4 quantization            | 5.0+ | Maxwell (GTX 900 series, TITAN X, M40) or newer GPUs * |
+|---------------------------------|-----------------|---------------------------------------------|
+| LLM.int8()                      | 7.5+            | Turing (RTX 20 series, T4) or newer GPUs             |
+| 8-bit optimizers/quantization   | 5.0+            | Maxwell (GTX 900 series, TITAN X, M40) or newer GPUs |
+| NF4/FP4 quantization            | 5.0+            | Maxwell (GTX 900 series, TITAN X, M40) or newer GPUs |
 
 > [!WARNING]
 > Support for Maxwell GPUs is deprecated and will be removed in a future release. For the best results, a Turing generation device or newer is recommended.
@@ -41,7 +41,8 @@ The currently distributed `bitsandbytes` packages are built with the following c
 |--------------------|------------------|----------------------|--------------
 | **Linux x86-64**   | 11.8 - 12.6      | GCC 11.2             | sm50, sm60, sm75, sm80, sm86, sm89, sm90, sm100, sm120
 | **Linux x86-64**   | 12.8             | GCC 11.2             | sm75, sm80, sm86, sm89, sm90, sm100, sm120
-| **Linux aarch64**  | 11.8 - 12.8      | GCC 11.2             | sm75, sm80, sm90, sm100
+| **Linux aarch64**  | 11.8 - 12.6      | GCC 11.2             | sm75, sm80, sm90
+| **Linux aarch64**  | 12.8             | GCC 11.2             | sm75, sm80, sm90, sm100
 | **Windows x86-64** | 11.8 - 12.8      | MSVC 19.43+ (VS2022) | sm50, sm60, sm75, sm80, sm86, sm89, sm90, sm100, sm120
 
 Use `pip` or `uv` to install:
@@ -53,7 +54,7 @@ pip install bitsandbytes
 ### Compile from source[[cuda-compile]]
 
 > [!TIP]
-> Don't hesitate to compile from source! The process is pretty straight forward and resilient. This might be needed for older CUDA Toolkit versions or Linux distributions, or other less common configurations/
+> Don't hesitate to compile from source! The process is pretty straight forward and resilient. This might be needed for older CUDA Toolkit versions or Linux distributions, or other less common configurations.
 
 For Linux and Windows systems, compiling from source allows you to customize the build configurations. See below for detailed platform-specific instructions (see the `CMakeLists.txt` if you want to check the specifics and explore some additional options):
 
@@ -68,7 +69,7 @@ For example, to install a compiler and CMake on Ubuntu:
 apt-get install -y build-essential cmake
 ```
 
-You should also install CUDA Toolkit by following the [NVIDIA CUDA Installation Guide for Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) guide from NVIDIA. The current minimum supported CUDA Toolkit version that we test with is **11.8**.
+You should also install CUDA Toolkit by following the [NVIDIA CUDA Installation Guide for Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) guide. The current minimum supported CUDA Toolkit version that we test with is **11.8**.
 
 ```bash
 git clone https://github.com/bitsandbytes-foundation/bitsandbytes.git && cd bitsandbytes/
@@ -78,12 +79,12 @@ pip install -e .   # `-e` for "editable" install, when developing BNB (otherwise
 ```
 
 > [!TIP]
-> If you have multiple versions of CUDA installed or installed it in a non-standard location, please refer to CMake CUDA documentation for how to configure the CUDA compiler.
+> If you have multiple versions of the CUDA Toolkit installed or it is in a non-standard location, please refer to CMake CUDA documentation for how to configure the CUDA compiler.
 
 </hfoption>
 <hfoption id="Windows">
 
-Windows systems require Visual Studio with C++ support as well as an installation of the CUDA SDK.
+Compilation from source on Windows systems require Visual Studio with C++ support as well as an installation of the CUDA Toolkit.
 
 To compile from source, you need CMake >= **3.22.1** and Python >= **3.9** installed. You should also install CUDA Toolkit by following the [CUDA Installation Guide for Windows](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html) guide from NVIDIA. The current minimum supported CUDA Toolkit version that we test with is **11.8**.
 
@@ -106,7 +107,7 @@ If you would like to use new features even before they are officially released a
 <hfoptions id="OS">
 <hfoption id="Linux">
 
-```
+```bash
 # Note: if you don't want to reinstall our dependencies, append the `--no-deps` flag!
 
 # x86_64 (most users)
@@ -119,7 +120,7 @@ pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsand
 </hfoption>
 <hfoption id="Windows">
 
-```
+```bash
 # Note: if you don't want to reinstall our dependencies, append the `--no-deps` flag!
 pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-win_amd64.whl
 ```
@@ -129,7 +130,7 @@ pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsand
 
 ## Deprecated: Multi-Backend Preview[[multi-backend]]
 
-> [!TIP]
+> [!WARNING]
 > This functionality existed as an early technical preview and is not recommended for production use. We are in the process of upstreaming improved support for AMD and Intel hardware into the main project.
 
 We provide an early preview of support for AMD and Intel hardware as part of a development branch.
@@ -149,7 +150,7 @@ For each supported backend, follow the respective instructions below:
 
 To use this preview version of `bitsandbytes` with `transformers`, be sure to install:
 
-```
+```bash
 pip install "transformers>=4.45.1"
 ```
 
@@ -204,7 +205,7 @@ pip install --force-reinstall 'https://github.com/bitsandbytes-foundation/bitsan
 <hfoption id="Windows">
 This wheel provides support for the Intel XPU platform.
 
-```
+```bash
 # Note, if you don't want to reinstall our dependencies, append the `--no-deps` flag!
 pip install --force-reinstall 'https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_multi-backend-refactor/bitsandbytes-0.44.1.dev0-py3-none-win_amd64.whl'
 ```
@@ -243,7 +244,7 @@ It does not need compile CPP codes, all required ops are in [intel_extension_for
 
 The below commands are for Linux. For installing on Windows, please adapt the below commands according to the same pattern as described [the section above on compiling from source under the Windows tab](#cuda-compile).
 
-```
+```bash
 pip install intel_extension_for_pytorch
 git clone --depth 1 -b multi-backend-refactor https://github.com/bitsandbytes-foundation/bitsandbytes.git && cd bitsandbytes/
 pip install -e .   # `-e` for "editable" install, when developing BNB (otherwise leave that out)
@@ -255,7 +256,7 @@ pip install -e .   # `-e` for "editable" install, when developing BNB (otherwise
 
 Please refer to [the official Ascend installations instructions](https://www.hiascend.com/document/detail/zh/Pytorch/60RC3/configandinstg/instg/insg_0001.html) for guidance on how to install the necessary `torch_npu` dependency.
 
-```
+```bash
 # Install bitsandbytes from source
 # Clone bitsandbytes repo, Ascend NPU backend is currently enabled on multi-backend-refactor branch
 git clone -b multi-backend-refactor https://github.com/bitsandbytes-foundation/bitsandbytes.git && cd bitsandbytes/

From 483cc201b2aeaef37d556ff33691b933b6d2e09f Mon Sep 17 00:00:00 2001
From: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Date: Mon, 19 May 2025 17:02:28 -0400
Subject: [PATCH 4/7] correction

---
 docs/source/installation.mdx | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/docs/source/installation.mdx b/docs/source/installation.mdx
index 73e2e7ca4..761c2c0a1 100644
--- a/docs/source/installation.mdx
+++ b/docs/source/installation.mdx
@@ -43,7 +43,8 @@ The currently distributed `bitsandbytes` packages are built with the following c
 | **Linux x86-64**   | 12.8             | GCC 11.2             | sm75, sm80, sm86, sm89, sm90, sm100, sm120
 | **Linux aarch64**  | 11.8 - 12.6      | GCC 11.2             | sm75, sm80, sm90
 | **Linux aarch64**  | 12.8             | GCC 11.2             | sm75, sm80, sm90, sm100
-| **Windows x86-64** | 11.8 - 12.8      | MSVC 19.43+ (VS2022) | sm50, sm60, sm75, sm80, sm86, sm89, sm90, sm100, sm120
+| **Windows x86-64** | 11.8 - 12.6      | MSVC 19.43+ (VS2022) | sm50, sm60, sm75, sm80, sm86, sm89, sm90
+| **Windows x86-64** | 12.8             | MSVC 19.43+ (VS2022) | sm75, sm80, sm86, sm89, sm90, sm100, sm120
 
 Use `pip` or `uv` to install:
 

From 3f4093e23975e30e69082c94bc24b1891cd10a4f Mon Sep 17 00:00:00 2001
From: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Date: Mon, 19 May 2025 17:10:29 -0400
Subject: [PATCH 5/7] Minor doc revisions

---
 docs/source/contributing.mdx         | 3 +--
 docs/source/faqs.mdx                 | 2 --
 docs/source/reference/functional.mdx | 5 -----
 3 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/docs/source/contributing.mdx b/docs/source/contributing.mdx
index 5da42961e..464f92164 100644
--- a/docs/source/contributing.mdx
+++ b/docs/source/contributing.mdx
@@ -1,5 +1,4 @@
-# Contributors guidelines
-... still under construction ... (feel free to propose materials, `bitsandbytes` is a community project)
+# Contribution Guide
 
 ## Setup
 
diff --git a/docs/source/faqs.mdx b/docs/source/faqs.mdx
index b95a1d799..c81257451 100644
--- a/docs/source/faqs.mdx
+++ b/docs/source/faqs.mdx
@@ -3,5 +3,3 @@
 Please submit your questions in [this Github Discussion thread](https://github.com/bitsandbytes-foundation/bitsandbytes/discussions/1013) if you feel that they will likely affect a lot of other users and that they haven't been sufficiently covered in the documentation.
 
 We'll pick the most generally applicable ones and post the QAs here or integrate them into the general documentation (also feel free to submit doc PRs, please).
-
-# ... under construction ...
diff --git a/docs/source/reference/functional.mdx b/docs/source/reference/functional.mdx
index dbbe21794..cc46675c6 100644
--- a/docs/source/reference/functional.mdx
+++ b/docs/source/reference/functional.mdx
@@ -9,8 +9,6 @@ The `bitsandbytes.functional` API provides the low-level building blocks for the
 * For experimental or research purposes requiring non-standard quantization or performance optimizations.
 
 ## LLM.int8()
-[[autodoc]] functional.int8_double_quant
-
 [[autodoc]] functional.int8_linear_matmul
 
 [[autodoc]] functional.int8_mm_dequant
@@ -19,7 +17,6 @@ The `bitsandbytes.functional` API provides the low-level building blocks for the
 
 [[autodoc]] functional.int8_vectorwise_quant
 
-
 ## 4-bit
 [[autodoc]] functional.dequantize_4bit
 
@@ -49,5 +46,3 @@ For more details see [8-Bit Approximations for Parallelism in Deep Learning](htt
 
 ## Utility
 [[autodoc]] functional.get_ptr
-
-[[autodoc]] functional.is_on_gpu

From 1e2062537090e9ab5f98be4bae505ac8e2b94fe5 Mon Sep 17 00:00:00 2001
From: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Date: Thu, 22 May 2025 14:02:28 -0400
Subject: [PATCH 6/7] Update installation.mdx

---
 docs/source/installation.mdx | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/docs/source/installation.mdx b/docs/source/installation.mdx
index 761c2c0a1..11dfbf5ea 100644
--- a/docs/source/installation.mdx
+++ b/docs/source/installation.mdx
@@ -1,9 +1,6 @@
 # Installation Guide
 
-Welcome to the installation guide for the `bitsandbytes` library! This document provides step-by-step instructions to install `bitsandbytes` across various platforms and hardware configurations. The library primarily supports CUDA-based GPUs, but the team is actively working on enabling support for additional backends like AMD ROCm and Intel XPU.
-
-> [!TIP]
-> For a high-level overview of backend support and compatibility, see the [Multi-backend Support](#multi-backend) section.
+Welcome to the installation guide for the `bitsandbytes` library! This document provides step-by-step instructions to install `bitsandbytes` across various platforms and hardware configurations. The library primarily supports CUDA-based GPUs, but the team is actively working on enabling support for additional backends like CPU, AMD ROCm, Intel XPU, and Gaudi HPU.
 
 ## Table of Contents
 
@@ -11,7 +8,7 @@ Welcome to the installation guide for the `bitsandbytes` library! This document
   - [Installation via PyPI](#cuda-pip)
   - [Compile from Source](#cuda-compile)
   - [Preview Wheels from `main`](#cuda-preview)
-- [Deprecated: Multi-Backend Preview](#multi-backend)
+- [Multi-Backend Preview](#multi-backend)
   - [Supported Backends](#multi-backend-supported-backends)
   - [Pre-requisites](#multi-backend-pre-requisites)
   - [Installation](#multi-backend-pip)
@@ -39,7 +36,7 @@ The currently distributed `bitsandbytes` packages are built with the following c
 
 | **OS**             | **CUDA Toolkit** | **Host Compiler**    | **Targets**
 |--------------------|------------------|----------------------|--------------
-| **Linux x86-64**   | 11.8 - 12.6      | GCC 11.2             | sm50, sm60, sm75, sm80, sm86, sm89, sm90, sm100, sm120
+| **Linux x86-64**   | 11.8 - 12.6      | GCC 11.2             | sm50, sm60, sm75, sm80, sm86, sm89, sm90
 | **Linux x86-64**   | 12.8             | GCC 11.2             | sm75, sm80, sm86, sm89, sm90, sm100, sm120
 | **Linux aarch64**  | 11.8 - 12.6      | GCC 11.2             | sm75, sm80, sm90
 | **Linux aarch64**  | 12.8             | GCC 11.2             | sm75, sm80, sm90, sm100
@@ -129,7 +126,7 @@ pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsand
 </hfoptions>
 
 
-## Deprecated: Multi-Backend Preview[[multi-backend]]
+## Multi-Backend Preview[[multi-backend]]
 
 > [!WARNING]
 > This functionality existed as an early technical preview and is not recommended for production use. We are in the process of upstreaming improved support for AMD and Intel hardware into the main project.

From 9667a22b6ca24d68ae3abeccb6cdf756b72ea7b4 Mon Sep 17 00:00:00 2001
From: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Date: Thu, 22 May 2025 14:05:14 -0400
Subject: [PATCH 7/7] Update _toctree.yml

---
 docs/source/_toctree.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
index ddf409d5c..0f46fe6b0 100644
--- a/docs/source/_toctree.yml
+++ b/docs/source/_toctree.yml
@@ -53,7 +53,7 @@
       title: RMSprop
     - local: reference/optim/sgd
       title: SGD
-  - title: k-bit quantizers
+  - title: Modules
     sections:
     - local: reference/nn/linear8bit
       title: LLM.int8()