BertTokenizer may not be optimal choice for converstion

Tensorflow supports two (or three) different types of WordPiece tokenizers. 
Could be worth testing to use the [FastWordPiece](https://www.tensorflow.org/text/api_docs/python/text/FastWordpieceTokenizer) tokenizer, since it can build the model from a vocab directly and claims to be faster as mentioned:

- https://github.com/tensorflow/text/issues/116
- https://github.com/tensorflow/text/issues/414

But is will likely also require a bit more setup (https://www.tensorflow.org/text/guide/subwords_tokenizer#overview), as WordPiece only see to split words, but the BertTokenizer splits sentences

### Goal
- Compare the different tokenizers and see if they yield the same results
- Compare if the new tokenizer can be saved as a Reusable SavedModel
- Test if the models that previously fails now work #4 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BertTokenizer may not be optimal choice for converstion #10

Goal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BertTokenizer may not be optimal choice for converstion #10

Description

Goal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions