Skip to content

vibeprogrammer/swift-sentencepiece

 
 

Repository files navigation

swift-sentencepiece

Use SentencePiece in Swift for tokenization and detokenization.

Installation

Add the following to your Package.swift file. In the package dependencies add:

dependencies: [
    .package(url: "https://github.com/jkrukowski/swift-sentencepiece", from: "0.0.3")
]

In the target dependencies add:

dependencies: [
    .product(name: "SentencepieceTokenizer", package: "swift-sentencepiece")
]

Usage

Encoding

import SentencepieceTokenizer

// load tokenizer from file
let tokenizer = try SentencepieceTokenizer(modelPath: "/path/to/sentencepiece.model")

// encode text
let encoded = tokenizer.encode("Hello, world!")
print(encoded)

// decode tokens
let decoded = tokenizer.decode([35378, 4, 8999, 38])
print(decoded)

Command Line Demo

To run the command line demo, use the following command:

swift run sentencepiece-cli --model-path <model-path> [--text <text>]

Command line options:

--model-path <model-path>
--text <text>           (default: Hello, world!)
-h, --help              Show help information.

Code Formatting

This project uses swift-format. To format the code run:

swift format . -i -r --configuration .swift-format

Acknowledgements

This project wraps the original implementation SentencePiece

About

Use SentencePiece in Swift for tokenization and detokenization.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Swift 100.0%