Now that the QAT + LoRA recipe has landed in #1931, we can support a finetuning flow like the one used to generate the quantized Llama 3.2 1B and 3B checkpoints (see e.g. the 1B checkpoint here). Unlike traditional LoRA, one path for finetuning with QAT + LoRA involves updating both the LoRA weights and the base model weights (with the fake quantization operation), as referenced in this blog. We should add the option in our QAT + LoRA recipe to make all params trainable, not just the LoRA ones. This can be done by modifying the call to set_trainable_params
here.