The motivation of this project is to optimize the inference speed of deploying BERT on CPU using PyTorch in Python, while also supporting C++ projects.
Here's a blog but writes in Chinese notes a real case of using the project to optimize inference speed.
- Make sure
Rustinstalled (curl --proto '=https' --tlsv1.2 https://sh.rustup.rs -sSf | shand check it byrustc -V) - Convert Pytorch checkpoint file and configs to ggml file using
cd scripts/
python convert_hf_to_ggml.py ${dir_hf_model} -s ${dir_saved_ggml_model}
- Make sure
tokenizer.jsonexists, otherwise execute
cd scripts/
python generate_tokenizer_json.py ${dir_hf_model}
- Build dynamic library(
libbert_shared.so)
git submodule update --init --recursive
mkdir build
cd build/
cmake ..
make
- Refer to
examples/sample_dylib.py, replace PyTorch inference.
- Make sure
Rustinstalled (curl --proto '=https' --tlsv1.2 https://sh.rustup.rs -sSf | shand check it byrustc -V) - Convert Pytorch checkpoint file and configs to ggml file using
cd scripts/
python convert_hf_to_ggml.py ${dir_hf_model} -s ${dir_saved_ggml_model}
- Make sure
tokenizer.jsonexists, otherwise execute
cd scripts/
python generate_tokenizer_json.py ${dir_hf_model}
- Add this project as a submodule and include it via
add_sub_directoryin your CMake project. You also need to turn on c++17 support. you can then link the library.
| Tokenizer type | cost time |
|---|---|
| transformers.BertTokenizer (Python) | 734ms |
| tokenizers-cpp (binding Rust) | 3ms |
| Type | cost time |
|---|---|
| Python W/O Loading | 248ms |
| C++&Rust W/O Loading(n_thread=4) | 2ms |
| Python W Loading | 1104ms |
| C++&Rust W Loading(n_thread=4) | 19ms |
| Type | cost time |
|---|---|
| Python(batch_size=50) | 260ms |
| C++&Rust (batch_size=50, n_thread=8) | 23ms |
ggml performance worse as sentence length increases
| Type | cost time |
|---|---|
| Python&C++&Rust (batch_size=50, n_thread=8) | 26ms |
- Using broadcast instead of
ggml.repeat. (WIP) - Update ggml format to gguf.
- Implement Python binding instead of dynamic library.
Thanks for the projects we rely on or refer to.