File tree Expand file tree Collapse file tree 1 file changed +2
-2
lines changed Expand file tree Collapse file tree 1 file changed +2
-2
lines changed Original file line number Diff line number Diff line change @@ -47,9 +47,9 @@ This table shows inference performance across different hardware and quantizatio
47
47
| :----------------------------:| :------------:| :---------------------:| :---------------------:| :---------------------:| :-------------:|
48
48
| | | ** Q8_0** | ** Q4_0** | ** Q4_0** | ** Support** |
49
49
| ** NVIDIA / OpenCL-PTX** | RTX 3070 | 52 tokens/s | 50.56 tokens/s | 22.96 tokens/s | ✅ |
50
- | | RTX 4090 | 66.07 tokens/s | 65.81 tokens/s | 35.51 tokens/s | ✅ |
50
+ | | RTX 4090 | 66.07 tokens/s | 65.81 tokens/s | 35.51 tokens/s | ✅ |
51
51
| | RTX 5090 | 96.65 tokens/s | 94.71 tokens/s | 47.68 tokens/s | ✅ |
52
- | | H100 | XXXX tokens/s | XXXX tokens/s | XXXX tokens/s | ✅ |
52
+ | | L4 Tensor | 52.96 tokens/s | 52.92 tokens/s | 22.68 tokens/s | ✅ |
53
53
| ** Intel / OpenCL** | Arc A770 | 15.65 tokens/s | 15.09 tokens/s | 7.02 tokens/s | (WIP) |
54
54
| ** Apple Silicon / OpenCL** | M3 Pro | 14.04 tokens/s | 13.83 tokens/s | 6.78 tokens/s | (WIP) |
55
55
| | M4 Pro | 16.77 tokens/s | 16.67 tokens/s | 8.56 tokens/s | (WIP) |
You can’t perform that action at this time.
0 commit comments