-
Notifications
You must be signed in to change notification settings - Fork 92
feat: add supertokenizers #236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 36 commits
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
b172b6d
remove multiword warning
stephantul d24c387
add superbpe tokenizers
stephantul 27e856f
Merge branch 'main' into add-superbpe
stephantul 5f36097
merge
stephantul c9e7d14
fix issue with mwe
stephantul ecc89b8
merge
stephantul 9d301d1
form
stephantul f4d6a82
Merge branch 'main' into add-superbpe
stephantul 1666dd2
working version
stephantul 8611ad5
first pass
stephantul 59502a1
small fixes, many comments
stephantul e06c5d9
fix e5 bug
stephantul 13a95dc
Adjust arcane formulae
stephantul e85e292
fix: logging
stephantul 1c57d40
Merge branch 'main' into add-superbpe
stephantul b05c669
wip
stephantul 3a7408a
wip
stephantul 5275fca
wip
stephantul 12d9ff2
lower complexity
stephantul c52ab40
add lock file
stephantul 077a550
fix: metaspace pretokenizer
stephantul cff4035
fix: bug in vocab
stephantul a972c10
feat: spaces/commas etc.
stephantul e2789ba
turn tokenizer into package
stephantul d19ab92
add annotations
stephantul 0bd32c0
feat: turn tokenizer into package
stephantul b48bd60
fix: future
stephantul abcf903
add tokenizer function
stephantul a201e0a
update lockfile
stephantul 345a701
feat: improve segmentation of unigram
stephantul 1350195
Merge branch 'main' into add-superbpe
stephantul cf005aa
merge
stephantul 9301f00
fix: broken merge
stephantul 796e18f
fix interpunct tokens
stephantul 3aff31b
fix tests, make tokenizer changes better
stephantul bae0193
update lock file
stephantul 336655e
fix comment, add additional check for pad token
stephantul f6a27a4
Merge branch 'main' into add-superbpe
stephantul 02f5591
tests: add a lot of tests
stephantul 98546da
fix: 3.9 error
stephantul File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😮