Skip to content

Commit 8876a97

Browse files
committed
Refactor Tokenizer->BaseTokenizer
1 parent a04f6bd commit 8876a97

36 files changed

+6836
-2444
lines changed

.ci/docker/requirements-dev.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,4 @@ pytest==7.3.2
33
pytest-cov
44
pre-commit
55
tomli-w >= 1.1.0
6+
transformers

.ci/docker/requirements.txt

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@ torchdata >= 0.8.0
22
datasets >= 3.6.0
33
tomli >= 1.1.0 ; python_version < "3.11"
44
tensorboard
5-
tiktoken
65
blobfile
76
tabulate
87
wandb

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ We actively welcome your pull requests.
1414
2. If you've added code that should be tested, add tests.
1515
3. If you've changed APIs, update the documentation.
1616
4. Ensure the test suite passes.
17-
5. Make sure your code lints (`pre-commit run --all-files`).
17+
5. Make sure your code lints (`pre-commit run --from-ref origin/main --to-ref HEAD`).
1818
6. If you haven't already, complete the Contributor License Agreement ("CLA").
1919

2020
### Contributor License Agreement ("CLA")

scripts/generate/test_generate.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,7 @@ def test_generate(
162162
input_ids = (
163163
(
164164
torch.tensor(
165-
tokenizer.encode(prompt, bos=True, eos=False), dtype=torch.long
165+
tokenizer.encode(prompt, add_bos=True, add_eos=False), dtype=torch.long
166166
)
167167
.view(1, -1)
168168
.repeat(batch_size, 1)

0 commit comments

Comments
 (0)