Training NER model with a frozen transformer #12255

cjer · 2023-02-08T11:54:48Z

cjer
Feb 8, 2023

I am trying to train a NER model on top of a frozen transformer. I understand from discussion posts (#11522, #11608) and open issues (#11547) that there are currently bugs with freezing a transformer in training. Nevertheless, as suggested, I have been getting some results with setting the grad_factor = 0. After significantly raising the learning rate and also trying to turn use_upper on or off, the model does learn, but very slowly, and not even reaching the accuracy of a default config non-transformer NER model.

Do you have any suggestions regarding the config or setup?

EDIT: Providing some background and context: We have an existing annotation pipeline on which we run five separate spacy NER models (non-transformer, each with their own tok2vec layer), and save their results to a list of overlapping entities. We now want to add a new (sixth) NER model based on a new dataset we collected. We figured we would get better results by moving from a the current setup to one where the NER models use a transformer. We get great results when training a regular transformer model (using the default config generated by spacy init config), but running 6 separate transformers will absolutely bloat our pipeline. So the solution we came up with was to train all models on with the same frozen transformer (basically an upgraded embedding layer) which they will share at inference time. The problem is that we're failing to crack the training part in this setup.

Here is the config that gets the best (yet unsatisfactory) results:

[paths]
train = "data/train.spacy"
dev = "data/dev.spacy"
vectors = null
init_tok2vec = null

[system]
gpu_allocator = "pytorch"
seed = 0

[nlp]
lang = "en"
pipeline = ["transformer","ner"]
batch_size = 128
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}

[components]

[components.ner]
factory = "ner"
incorrect_spans_key = null
moves = null
scorer = {"@scorers":"spacy.ner_scorer.v1"}
update_with_oracle_cut_size = 100

[components.ner.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = true
nO = null

[components.ner.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 0.0
pooling = {"@layers":"reduce_mean.v1"}
upstream = "*"

[components.transformer]
source = "output/en_core_sci_biobert_parser_tagger/model-best"

[corpora]

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
limit = 0

[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 0
limit = 0

[training]
accumulate_gradient = 3
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
patience = 1600
max_epochs = 0
max_steps = 20000
eval_frequency = 200
frozen_components = []
annotating_components = []
before_to_disk = null

[training.batcher]
@batchers = "spacy.batch_by_padded.v1"
discard_oversize = true
size = 2000
buffer = 256
get_length = null

[training.logger]
@loggers = "spacy.ConsoleLogger.v1"
progress_bar = false

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001
learn_rate = 0.001

[training.score_weights]
ents_f = 1.0
ents_p = 0.0
ents_r = 0.0
ents_per_type = null

[pretraining]

[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
after_init = null

[initialize.before_init]
@callbacks = "customize_tokenizer"

Answered by rmitsch

Feb 9, 2023

Hi @cjer, as you noted, freezing transformers is not supported yet - and unfortunately I can't give you any useful recommendations on how to effectively circumvent this particular issue at the moment.

Looking at what you actually want to achieve, there might be a workaround though. You mention that you have several NER models and finally merge their results. You are doing this because you want overlapping entities and the NER model doesn't allow for that, correct?

In that case you may want to consider switching to spancat (give this a read, if you haven't yet; docs). Spans can overlap there, so you could use a single model and thus avoid freezing the transfomer.

View full answer

rmitsch · 2023-02-09T11:37:44Z

rmitsch
Feb 9, 2023
Maintainer

Hi @cjer, as you noted, freezing transformers is not supported yet - and unfortunately I can't give you any useful recommendations on how to effectively circumvent this particular issue at the moment.

Looking at what you actually want to achieve, there might be a workaround though. You mention that you have several NER models and finally merge their results. You are doing this because you want overlapping entities and the NER model doesn't allow for that, correct?

In that case you may want to consider switching to spancat (give this a read, if you haven't yet; docs). Spans can overlap there, so you could use a single model and thus avoid freezing the transfomer.

1 reply

cjer Feb 9, 2023
Author

Hey @rmitsch, thanks for your help.
I have been looking into the new spancat, and it looks very relevant on paper, but again am struggling to figure out how we would train this with our existing data. We have a separate dataset for each NER task, that contains only annotations for its specific labels. We don't have sentences that contain all or even some combination of the different tasks. So in training you would only get a partial for every sentence.
I can imagine a setup where we use the existing NER models to annotate the datasets of the other tasks, and then use this as silver labels, but I'm sceptic as to how this would perform.

honnibal · 2023-02-09T13:30:21Z

honnibal
Feb 9, 2023
Maintainer

Hi @cjer ,

Are you open to "creative" solutions? (Also known as, preposterous hacks 😅 ). My instinct would be to lean into the hackery while you get it set up, to see how it performs. You can always find ways to tidy it up later. Here's the quick notes assuming all Thinc knowledge etc. Happy to expand out the gaps --- this is definitely stack-specific.

What we need is a registered layer that has the signature Model[List[Doc], List[Floats2d]], i.e. a Tok2Vec layer. Then the rest of the model can read from it. My idea is to keep it really dumb and simple at first, and just use a global:

@registry.layers("sinful_transformer_tok2vec.v1")
def get_trf_embeddings_from_global() -> Model[List[Doc], List[Floats2d]]:
    return Model("sinful_transformer_tok2vec", forward_trf_via_global)

def forward_trf_via_global(model: Model[List[Doc], List[Floats2d], docs: List[doc], is_train: bool) -> Tuple[Floats2d, Callable]:
    # Access the global data and compute the embedding representation
    # The returned list should have one ndarray per doc, with one vector per token.
    output = _somehow_get_vectors_from_global(docs)

    def backprop_noop(d_vectors: List[Floats2d]) -> List[Doc]:
        return []

    return output, backprop_noop

You'd then reference this layer in the configs of the models you're training. Note that it's not a listener -- it's just a normal sublayer of the NER model. There should then be no problem with composing those models into a single pipeline.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Training NER model with a frozen transformer #12255

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

Training NER model with a frozen transformer #12255

Uh oh!

Uh oh!

cjer Feb 8, 2023

Replies: 2 comments · 1 reply

Uh oh!

rmitsch Feb 9, 2023 Maintainer

Uh oh!

Uh oh!

cjer Feb 9, 2023 Author

Uh oh!

Uh oh!

honnibal Feb 9, 2023 Maintainer

cjer
Feb 8, 2023

Replies: 2 comments 1 reply

rmitsch
Feb 9, 2023
Maintainer

cjer Feb 9, 2023
Author

honnibal
Feb 9, 2023
Maintainer