Skip to content

Commit ec07a42

Browse files
Add doc comment to type
1 parent 851a559 commit ec07a42

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

crates/bpe-openai/src/lib.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,12 @@ static BPE_O200K: LazyLock<Tokenizer> = LazyLock::new(|| {
4242

4343
pub use bpe::*;
4444

45+
/// A byte-pair encoding tokenizer that supports a pre-tokenization regex.
46+
/// The direct methods on this type pre-tokenize the input text and should
47+
/// produce the same output as the tiktoken tokenizers. The type gives access
48+
/// to the regex and underlying bye-pair encoding if needed. Note that using
49+
/// the byte-pair encoding directly does not take the regex into account and
50+
/// may result in output that differs from tiktoken.
4551
pub struct Tokenizer {
4652
/// The byte-pair encoding for this tokenizer.
4753
pub bpe: BytePairEncoding,

0 commit comments

Comments
 (0)