I'm assuming the inference is using FP32 or BF16 but is there a model that is quantized for FP16, INT8, or even INT4?