This repository demonstrates how to convert the complete BGE-M3 model to ONNX format and use it in multiple programming languages with full multi-vector functionality.
- Generate all three BGE-M3 embedding types: dense, sparse, and ColBERT vectors
- Reduced latency with local embedding generation
- Full control over the embedding pipeline with no external dependencies
- Works offline without internet connectivity requirements
- Cross-platform compatibility (C#, Java, Python)
bge-m3-to-onnx.ipynb
- Jupyter notebook demonstrating the BGE-M3 conversion process/samples/dotnet
- C# implementation and tests with full BGE-M3 support/samples/java
- Java implementation and tests with full BGE-M3 supportgenerate_reference_embeddings.py
- Script to generate reference embeddings for cross-language testingrun_tests.sh
andrun_tests.ps1
- Test scripts for Linux/macOS and Windows
-
Clone this repository:
git clone https://github.com/yuniko-software/bge-m3-onnx.git cd bge-m3-onnx
-
Get the BGE-M3 ONNX models:
-
Option 1: Download from releases (recommended)
- Check the repository releases and download
onnx.zip
- It already contains the bge-m3 embedding model and its tokenizer
- Check the repository releases and download
-
Option 2: Generate yourself using the notebook
- Open and run
bge-m3-to-onnx.ipynb
- this is the most important file in the repository - The notebook demonstrates how to convert BGE-M3 from FlagEmbedding to ONNX format
- This will create
bge_m3_tokenizer.onnx
,bge_m3_model.onnx
, andbge_m3_model.onnx_data
in the/onnx
folder
- Open and run
Note: This repository uses
BAAI/bge-m3
as the embedding model with its XLM-RoBERTa tokenizer. -
-
Generate reference embeddings (optional):
- Run
python generate_reference_embeddings.py
to create reference embeddings for testing
- Run
-
Run the samples:
- Once you have the ONNX models in the
/onnx
folder, you can run any sample - Try the .NET sample in
/samples/dotnet
or the Java sample in/samples/java
- Once you have the ONNX models in the
-
Verify cross-language embeddings (optional):
-
To ensure that .NET and Java embeddings match the Python-generated embeddings, you can run:
-
On Linux/macOS:
chmod +x run_tests.sh ./run_tests.sh
-
On Windows:
./run_tests.ps1
Note: These scripts require Python, .NET, Java, and Maven to be installed.
-
import onnxruntime as ort
import numpy as np
from onnxruntime_extensions import get_library_path
# Initialize BGE-M3 ONNX embedder
embedder = OnnxBGEM3Embedder("onnx/bge_m3_tokenizer.onnx", "onnx/bge_m3_model.onnx")
# Generate all three embedding types
result = embedder.encode("Hello world!")
print(f"Dense: {len(result['dense_vecs'])} dimensions")
print(f"Sparse: {len(result['lexical_weights'])} tokens")
print(f"ColBERT: {len(result['colbert_vecs'])} vectors")
# See full implementation in generate_reference_embeddings.py
using BgeM3.Onnx;
// Initialize embedder
using var embedder = new M3Embedder(tokenizerPath, modelPath);
// Generate all embedding types
var result = embedder.GenerateEmbeddings("Hello world!");
Console.WriteLine($"Dense: {result.DenseEmbedding.Length} dimensions");
Console.WriteLine($"Sparse: {result.SparseWeights.Count} tokens");
Console.WriteLine($"ColBERT: {result.ColBertVectors.Length} vectors");
// See full implementation in samples/dotnet
import com.yunikosoftware.bgem3onnx.M3Embedder;
// Initialize embedder
try (M3Embedder embedder = new M3Embedder(tokenizerPath, modelPath)) {
// Generate all embedding types
M3EmbeddingOutput result = embedder.generateEmbeddings("Hello world!");
System.out.println("Dense: " + result.getDenseEmbedding().length + " dimensions");
System.out.println("Sparse: " + result.getSparseWeights().size() + " tokens");
System.out.println("ColBERT: " + result.getColBertVectors().length + " vectors");
}
// See full implementation in samples/java
⭐ If you find this project useful, please consider giving it a star on GitHub! ⭐
Your support helps make this project more visible to other developers who might benefit from BGE-M3's complete multi-vector functionality.