upodesh
(In Bangla, উপদেশ
is a synonym of the word পরামর্শ
which means suggestion
) is a Bangla word suggestion library.
This implementation uses an approach based on the Finite State Transducer (FST) data structure which is substantially faster than the Regular Expression based approach. This approach is inspired by the Go project libavrophonetic
of Mehdi Hasan Khan which used Trie data structure.
upodesh
is significantly faster than the previously used heavily optimized regex-based search approach in OpenBangla Keyboard. Based on recent benchmarks, it is approximately ~21× to ~58× faster, depending on the input. This demonstrates a substantial performance gain over regex, especially in cases where large patterns previously caused bottlenecks.
This benchmark was performed on a Apple MacBook Air M1:
Word | upodesh Time |
regex Time |
Speedup |
---|---|---|---|
a |
~3.341 µs | ~194.34 µs | ~58× faster |
arO |
~11.840 µs | ~246.53 µs | ~20.8× faster |
bistari |
~9.734 µs | ~353.74 µs | ~36.3× faster |
- Mehdi Hasan Khan and Tahmid Sadik for their
libavrophonetic
project. - Andrew Gallant for his amazing
fst
crate and Index 1,600,000,000 Keys with Automata and Rust blog post!