Description
For CUDA: pytest
CUDA: 3711 passed, 0 failed, 15 skipped, 24 xfailed, 781 warnings in 498.21s
For CPU: pytest --ignore test_optim.py --ignore test_triton.py --ignore test_cuda_setup_evaluator.py
For XPU: BNB_TEST_DEVICE="xpu" pytest --ignore test_optim.py --ignore test_triton.py --ignore test_cuda_setup_evaluator.py
CPU previous: 378 passed, 1537 failed, 1638 skipped, 197 xfailed, 153 warnings in 613.27s
CPU current: 2079 passed, 1498 skipped, 153 deselected, 9 xfailed, 59 warnings in 1192.94s
XPU previous: not enabled
XPU current: 2093 passed, 1493 skipped, 153 deselected, 63 warnings in 562.25s
Hi @matthewdouglas . The CPU and XPU can pass most cases in this PR: #1628 , but I still have some concerns about the skipped tests. Please check the following issues:
- These tests are tagged to be deprecated in the future. Can we deprecate the tests now? (Including matmullt, has_fp16_weights, linear8bitlt, )
- These tests are test functions where I cannot see anywhere used. Can we remove the tests now? Including quantile, spmm, coo2csc,
- This test didn't actually test anything: bench 4bit dequant. Could we remove it now?
- I see we skip blockwise quantization when block size != 256 in CPU/XPU, but we support any size blockwise quantization, can we also enable other sizes check?
- We skip blockwise quantization when dtype != float32, but I already support fp16/bf16 dtype, should we add these dtypes to tests?
- Why we run CPU ops even the device is XPU or CUDA? cpu tests in xpu
- Why we check torch op?