Skip to content

Commit 0c9a05a

Browse files
authored
Update TORNADOVM_TRANSFORMER_OPTIMIZATIONS.md
1 parent 366b94d commit 0c9a05a

File tree

1 file changed

+2
-11
lines changed

1 file changed

+2
-11
lines changed

docs/TORNADOVM_TRANSFORMER_OPTIMIZATIONS.md

Lines changed: 2 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,8 @@
22

33
### Core Numerical Optimizations
44
- **Quantized Weight Support**
5-
- Optimized implementations for Q8_0 and Q4_0 formats
6-
- Block-based quantization with FP16 scale per 32-element block
7-
- **Vectorized Matrix Operations**
8-
- Uses vector parallelism with configurable unroll factors
9-
- Processes 4 elements at once with vectorization
10-
- **Loop Unrolling**
11-
- Strategic unrolling for performance (16x factor in matrix operations)
12-
- Reduces branch penalties and improves instruction-level parallelism
13-
- **Fused Multiply-Add (FMA)**
14-
- Uses fused operations for better numerical precision and performance
15-
- Optimizes dot product calculations
5+
- Optimized implementations for FP16 format
6+
- [*Experimental*] support for Q8 and Q4 with dequantize to FP16
167

178
### Memory and Caching Optimizations
189
- **Key-Value Cache**

0 commit comments

Comments
 (0)