Tok2vec loss is growing in TextCategorizer #12317

chisuikafuku · 2023-02-23T02:30:38Z

chisuikafuku
Feb 23, 2023

Hi,

First of all, thank you for your work, I love using Spacy !

I’m actually working on emails classification with spacy. Here the steps I followed :

Very few text preprocessing as advised in best practice. I’m just replacing successive white spaces by one space. Body text mail can have emails signature, headers and attachments urls. Actually it’s hard to remove them because of their diversities, a long data exploration process should be considered.
I initiate a config file for accuracy with the init config command. So my pipeline include a trainable tok2vec component in the TextCategorizer model.

Here’s the thing. I tried two trainings with the « long » french spacy model as starting point :

First : initial word vectors (output of init config cmd)

[paths]
vectors = "fr_core_news_lg"
init_tok2vec = null

(...) 

[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}

Second : initial word vectors + pretrained tok2vec

[paths]
vectors = "fr_core_news_lg"
init_tok2vec = "fr_core_news_lg"

(...) 

[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}

In both scenarios my tok2vec loss is decreasing during the first steps and after that it is growing… (from 5 to 300 !) I don’t understand why.

Best

svlandeg · 2023-02-23T13:50:41Z

svlandeg
Feb 23, 2023

HI @chisuikafuku! You mention that the tok2vec loss is decreasing and then growing, but what does the actual textcat performance look like? Can you paste the full output of the training steps in both scenarios?

3 replies

jkgenser Feb 24, 2023

Slightly off topic but I noticed a similar issue with my pipeline tok2vec --> ner. tok2vec loss increased at the end while ner loss went way down. What is the significance of this? I would expect as training continued both would decrease

chisuikafuku Feb 24, 2023
Author

Thanks for your answer !

Unfortunately I did not track the stdout during the execution of the train command. But I can give you more context sure.

The tok2vec loss is decreasing and then growing (until 300 approximatively)
The TextCat loss is decreasing (until 1 approximatively)
The TextCat score increases, but after tok2vec loss reaches a value of 100~150, TextCat score fluctuates around 50%

I don't have a lot of annotated data, 850 samples in training set and 850 in dev set, with a 50/50 split for each class to maintain balanced datasets.

adrianeboyd Mar 3, 2023

If you're still having problems, could you train again and post the output? It might be easier if we can see a few more details about what's going on...

pablobd · 2025-04-30T10:13:56Z

pablobd
Apr 30, 2025

Hi everyone 👋

I thought about reactivating this discussion ticket instead of creating a new one, since it seems very related...

I'm training a text classification model using spaCy v3 with the pipeline ["tok2vec", "textcat"]. Specifically, I'm using TextCatParametricAttention.v1 as the classifier, and I'm initializing the tok2vec component from the pretrained German model de_core_news_lg.

During training, I noticed the following:

LOSS TEXTCAT steadily decreases over time (as expected)

My classification metrics (e.g., macro F1, precision, recall) are improving

But LOSS TOK2VEC is increasing, quite significantly, from near-zero up to 1600+ over a few thousand steps

Here’s a small excerpt from the training logs:

My Understanding So Far

The tok2vec loss isn’t a standalone objective — it’s the amount of loss backpropagated from downstream components (in my case, just textcat) into tok2vec
This means that the tok2vec component is being updated more aggressively as training progresses, possibly because the classifier is learning to push harder on the tok2vec to fine-tune it for the task
The increase isn’t necessarily bad — it's not a sign that tok2vec is “failing,” just that it’s adapting

My Questions

Is this expected behavior when training with a pretrained tok2vec?
Would it make sense to freeze tok2vec for a few epochs first, then unfreeze it once textcat stabilizes?
Is there a way to separately control the learning rate for tok2vec and textcat in spaCy configs?
Are there any heuristics for interpreting LOSS TOK2VEC trends or knowing when they indicate a real issue?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Tok2vec loss is growing in TextCategorizer #12317

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Tok2vec loss is growing in TextCategorizer #12317

Uh oh!

chisuikafuku Feb 23, 2023

Replies: 2 comments · 3 replies

Uh oh!

svlandeg Feb 23, 2023

Uh oh!

jkgenser Feb 24, 2023

Uh oh!

chisuikafuku Feb 24, 2023 Author

Uh oh!

adrianeboyd Mar 3, 2023

Uh oh!

pablobd Apr 30, 2025

chisuikafuku
Feb 23, 2023

Replies: 2 comments 3 replies

svlandeg
Feb 23, 2023

chisuikafuku Feb 24, 2023
Author

pablobd
Apr 30, 2025