tokenizer.encode function`s param add_special_tokens=False not work.

### 🐛 Describe the bug

the tokenizer is from olmo.tokenizer package.
Keep token id to the default value 50279,  when the default tokenizer is loaded, run the code below:
input：
`tokenizer.encode("hello", add_special_tokens=False) ` 
output :  
[25521, 50279]
The Result shows that the parameter 'add_special_tokens=False' dose not work.
And I find the reason is in /olmo/tokenizer.py  line 183:
`batch_encoding = self.base_tokenizer.encode_batch(inputs)`
the param 'add_special_tokens' didn't pass to the base_tokenizer's encode function.
![tokenizer_error](https://github.com/user-attachments/assets/7224d4cd-ca63-4b4b-993b-6441f48a32af)

I can find the bug because it caused an assertion in /scripts/prepare_tulu_data.py line 90
![prepare_tulu_l90](https://github.com/user-attachments/assets/b75b290b-af71-432a-8657-4e50e45d8154)


### Versions

0.5.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tokenizer.encode function`s param add_special_tokens=False not work. #765

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

tokenizer.encode function`s param add_special_tokens=False not work. #765

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions