Skip to content

Commit 76476bc

Browse files
committed
Refactor Tokenizer->BaseTokenizer
1 parent a04f6bd commit 76476bc

File tree

31 files changed

+4772
-2440
lines changed

31 files changed

+4772
-2440
lines changed

.ci/docker/requirements-dev.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,4 @@ pytest==7.3.2
33
pytest-cov
44
pre-commit
55
tomli-w >= 1.1.0
6+
transformers

.ci/docker/requirements.txt

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@ torchdata >= 0.8.0
22
datasets >= 3.6.0
33
tomli >= 1.1.0 ; python_version < "3.11"
44
tensorboard
5-
tiktoken
65
blobfile
76
tabulate
87
wandb

scripts/generate/test_generate.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,7 @@ def test_generate(
162162
input_ids = (
163163
(
164164
torch.tensor(
165-
tokenizer.encode(prompt, bos=True, eos=False), dtype=torch.long
165+
tokenizer.encode(prompt, add_bos=True, add_eos=False), dtype=torch.long
166166
)
167167
.view(1, -1)
168168
.repeat(batch_size, 1)

0 commit comments

Comments
 (0)