Skip to content

Cleanup: throw error on missing tokenizer_config.json? #251

@pcuenca

Description

@pcuenca

Context: #246 (comment)

Some repos still don't have a tokenizer_config.json file (example), and it's not going to be added for BC considerations.

Question: should we just throw when this happens?

More info:

  • Loading the tokenizer with transformers works. Saving it generates all necessary files:
>>> from transformers import AutoTokenizer
>>> t = AutoTokenizer.from_pretrained("t5-base")
>>> t.save_pretrained("t5_saved")
  • transformers.js raises an exception. But since it relies on onnx repos that have been previously exported, it's a non-issue – exporting goes through the previous path and the file is available, see Xenova/t5-base

Thoughts on doing the same (throw an error) for the sake of cleanup?

cc @mattt @FL33TW00D

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions