SpanCategorizer not learning (either tok2vec or transformer based) for aspect extraction task #13780

elspanishgeek · 2025-03-27T18:07:36Z

elspanishgeek
Mar 27, 2025

I'm struggling to get SpanCategorizer to learn anything. All my attempts end up with the same 30 epochs in, and F1, Precision, and Recall are all 0.00, with a fluctuating, increasing loss. I'm trying to determine whether the problem is:

Poor annotation quality or insufficient data
A fundamental issue with my objective
An invalid approach
Hyperparameter tuning needs

Context

I'm extracting aspects (commentary about entities) from noisy online text. I'll use Formula 1 to craft an example:

Entity extraction (e.g., "Charles", "YUKI" → Driver, "Ferrari" → Team, "monaco" → Race) works well. Now, I want to classify spans like:

"Can't believe what I just saw, Charles is an absolute demon behind the wheel but Ferrari is gonna Ferrari, they need to replace their entire pit wall because their strategies never make sense"
- "is an absolute demon behind the wheel" → Driver Quality
- "they need to replace their entire pit wall because their strategies never make sense" → Team Quality
"LMAO classic monaco. i should've stayed in bed, this race is so boring"
- "this race is so boring" → Race Quality
"YUKI P4 WHAT A DRIVE!!!!"
- "P4 WHAT A DRIVE!!!!" → Driver Quality

Dataset

This is the output of my spacy debug:

✔ Pipeline can be initialized with data
✔ Corpus is loadable

=============================== Training stats ===============================
Language: en
Training pipeline: transformer, spancat
3206 training docs
801 evaluation docs
✔ No overlap between training and evaluation data

============================== Vocab & Vectors ==============================
ℹ 67709 total word(s) in the data (8190 unique)
10 most common words: '.' (2918), 'the' (2254), ',' (1611), 'to' (1311), 'a'
(1286), 'I' (1010), 'and' (965), 'of' (830), 'is' (777), 'in' (773)
ℹ No word vectors present in the package

============================ Span Categorization ============================
ℹ Span characteristics for spans_key 'sc'
ℹ SD = Span Distinctiveness, BD = Boundary Distinctiveness

Span Type     Length     SD     BD     N
---------     ------   ----   ----   ---
ASPECT_A        8.41   0.57   1.84   594
ASPECT_B        9.09   0.72   1.85   349
ASPECT_C       10.65   0.94   2.09   138
ASPECT_D        8.53   0.74   2.04   282
ASPECT_E        8.36   0.79   1.94   329
ASPECT_F       10.55   1.00   2.15   122
ASPECT_G        8.52   0.74   2.07   284
ASPECT_H        6.96   1.46   2.53    67
ASPECT_I       10.96   1.19   2.11    87
ASPECT_J       11.02   1.32   2.04    76
ASPECT_K        8.98   1.14   1.99   121
---------     ------   ----   ----   ---
Wgt. Avg        8.92   0.80   1.98     -

⚠ Spans may not be distinct from the rest of the corpus
10 most common span tokens: 'the', 'to', 'a', 'is', 'i', 'of', 'in', 'for',
'he', 'and'
✔ Boundary tokens are distinct from the rest of the corpus
10 most common span boundary tokens: '.', ',', ' ', 'and', 'but', '  ', ':',
'?', '!', 'that'
✔ Good amount of examples for all labels
✔ Examples without occurrences available for all labels

What I've Tried

Training with tok2vec, roberta-base, xlm-roberta-base → All got scores of 0.00 with default settings.
Overfitting test: Ran xlm-roberta-base on just two labels (most numerous & distinctive) with dropout = 0.0 and L2 = 0.0001. Some learning happened:

ℹ Pipeline: ['transformer', 'spancat']
ℹ Initial learn rate: 0.0
E    #       LOSS TRANS...  LOSS SPANCAT  SPANS_SC_F  SPANS_SC_P  SPANS_SC_R  SCORE 
---  ------  -------------  ------------  ----------  ----------  ----------  ------
  0       0       19845.07       2787.93        0.03        0.01       13.51    0.00
  3     200      697240.54     119701.95        0.00        0.00        0.00    0.00
  6     400          13.84       2142.19        0.00        0.00        0.00    0.00
 10     600          13.26       2196.46        0.00        0.00        0.00    0.00
 13     800          11.81       2142.22        7.51       28.57        4.32    0.08
 17    1000          29.79       1827.10        2.06       22.22        1.08    0.02
 20    1200          26.00       1602.05       16.39       33.90       10.81    0.16
 23    1400          15.37       1344.28       17.04       27.06       12.43    0.17
 27    1600          13.67       1153.99       11.16       40.00        6.49    0.11
 30    1800          13.45       1142.95       17.50       38.18       11.35    0.18
 34    2000           8.19       1051.33       19.23       33.33       13.51    0.19
 37    2200           6.84       1009.08       18.70       37.70       12.43    0.19
 41    2400           5.41        925.64       18.11       37.93       11.89    0.18
 44    2600           5.19        929.73       15.86       42.86        9.73    0.16
 47    2800           5.07        896.46       17.02       40.00       10.81    0.17
 51    3000          12.72        976.95       21.79       38.89       15.14    0.22
 54    3200           5.02        883.47       13.08       48.28        7.57    0.13
 57    3400           6.17        830.39       24.35       38.37       17.84    0.24
 61    3600           4.49        791.02       16.74       45.24       10.27    0.17
 64    3800           5.20        828.46       18.67       52.50       11.35    0.19
 68    4000           7.01        793.15       22.05       40.58       15.14    0.22
 71    4200           6.16        764.56       16.00       45.00        9.73    0.16
 74    4400           3.93        793.49       12.79       41.18        7.57    0.13
 78    4600           2.64        735.78       14.95       55.17        8.65    0.15
 81    4800           6.75        785.79       10.38       40.74        5.95    0.10
 85    5000           3.38        739.27       16.24       38.78       10.27    0.16

Questions

Has anyone successfully trained an aspect extractor with SpanCategorizer in noisy, informal text?
Would EntityRecognizer be a better approach?

Any insights on annotation quality checks, hyperparameter tuning, or alternative strategies would be greatly appreciated.

Thanks!

Config

This is one of the configs I used that gave me 0.00 scores:

[paths]
train = ./train.spacy
dev = ./dev.spacy
vectors = null
init_tok2vec = null

[system]
gpu_allocator = "pytorch"
seed = 0

[nlp]
lang = "en"
pipeline = ["transformer","spancat"]
batch_size = 128
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
vectors = {"@vectors":"spacy.Vectors.v1"}

[components]

[components.spancat]
factory = "spancat"
max_positive = null
scorer = {"@scorers":"spacy.spancat_scorer.v1"}
spans_key = "sc"
threshold = 0.5

[components.spancat.model]
@architectures = "spacy.SpanCategorizer.v1"

[components.spancat.model.reducer]
@layers = "spacy.mean_max_reducer.v1"
hidden_size = 128

[components.spancat.model.scorer]
@layers = "spacy.LinearLogistic.v1"
nO = null
nI = null

[components.spancat.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
pooling = {"@layers":"reduce_mean.v1"}
upstream = "*"

[components.spancat.suggester]
@misc = "spacy.ngram_range_suggester.v1"
min_size = 2
max_size = 17

[components.transformer]
factory = "transformer"
max_batch_items = 4096
set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}

[components.transformer.model]
@architectures = "spacy-transformers.TransformerModel.v3"
name = "xlm-roberta-base"
mixed_precision = false

[components.transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
window = 128
stride = 96

[components.transformer.model.grad_scaler_config]

[components.transformer.model.tokenizer_config]
use_fast = true

[components.transformer.model.transformer_config]

[corpora]

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[training]
accumulate_gradient = 3
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
patience = 1600
max_epochs = 0
max_steps = 20000
eval_frequency = 200
frozen_components = []
annotating_components = []
before_to_disk = null
before_update = null

[training.batcher]
@batchers = "spacy.batch_by_padded.v1"
discard_oversize = true
size = 2000
buffer = 256
get_length = null

[training.logger]
@loggers = "spacy.ConsoleLogger.v1"
progress_bar = false

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001

[training.optimizer.learn_rate]
@schedules = "warmup_linear.v1"
warmup_steps = 250
total_steps = 20000
initial_rate = 0.00005

[training.score_weights]
spans_sc_f = 1.0
spans_sc_p = 0.0
spans_sc_r = 0.0

[pretraining]

[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null

[initialize.components]

[initialize.components.spancat]

[initialize.components.spancat.labels]
@readers = "spacy.read_labels.v1"
path = "labels/spancat.json"

[initialize.tokenizer]

elspanishgeek · 2025-03-28T15:54:47Z

elspanishgeek
Mar 28, 2025
Author

I've been able to get the needle moving somewhat by adjusting training parameters.

The best config so far

[paths]
train = ./train.spacy
dev = ./dev.spacy
vectors = null
init_tok2vec = null

[system]
gpu_allocator = "pytorch"
seed = 0

[nlp]
lang = "en"
pipeline = ["transformer","spancat"]
batch_size = 256
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
vectors = {"@vectors":"spacy.Vectors.v1"}

[components]

[components.spancat]
factory = "spancat"
max_positive = null
scorer = {"@scorers":"spacy.spancat_scorer.v1"}
spans_key = "sc"
threshold = 0.5

[components.spancat.model]
@architectures = "spacy.SpanCategorizer.v1"

[components.spancat.model.reducer]
@layers = "spacy.mean_max_reducer.v1"
hidden_size = 128

[components.spancat.model.scorer]
@layers = "spacy.LinearLogistic.v1"
nO = null
nI = null

[components.spancat.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
pooling = {"@layers":"reduce_mean.v1"}
upstream = "*"

[components.spancat.suggester]
@misc = "spacy.ngram_range_suggester.v1"
min_size = 2
max_size = 17

[components.transformer]
factory = "transformer"
max_batch_items = 4096
set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}

[components.transformer.model]
@architectures = "spacy-transformers.TransformerModel.v3"
name = "roberta-base"
mixed_precision = false

[components.transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
window = 128
stride = 96

[components.transformer.model.grad_scaler_config]

[components.transformer.model.tokenizer_config]
use_fast = true

[components.transformer.model.transformer_config]

[corpora]

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[training]
accumulate_gradient = 3
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.2
patience = 2000
max_epochs = 0
max_steps = 30000
eval_frequency = 100
frozen_components = []
annotating_components = []
before_to_disk = null
before_update = null

[training.batcher]
@batchers = "spacy.batch_by_padded.v1"
discard_oversize = true
size = 1200
buffer = 256
get_length = null

[training.logger]
@loggers = "spacy.WandbLogger.v3"
project_name = "spancat_exploration_runs"
remove_config_values = []
log_dataset_dir = "./assets"
model_log_interval = 500

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.001
grad_clip = 1.0
use_averages = true
eps = 0.00000001

[training.optimizer.learn_rate]
@schedules = "warmup_linear.v1"
total_steps = 20000
warmup_steps = 750
initial_rate = 0.00003

[training.score_weights]
spans_sc_f = 0.8
spans_sc_p = 0.1
spans_sc_r = 0.1

[pretraining]

[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null

[initialize.components]

[initialize.tokenizer]

I'm seeing 3 potential paths and here's what I'm doing for each:

Making SpanCat work

Continue annotating as I only have between 300 to 500 per label, and in parallel continue tweaking training parameters, adding/swapping which annotated labels are being used, and performing more experimental runs. Can't say enough on how much the Weights&Biases library has come in handy here.

Test NER

I chose SpanCat from reading the Prodigy docs, however, I don't really have overlapping spans so I think I can transform my SpanCat annotated data to NER and give this a try.

Convert to TextCat

Drawing inspiration from "Healthsea", I am generating some statistics to see how many of my span annotations are singularly contained within sentences; meaning if within one sentence I only have one annotated span. If the count is high enough, I could change this from a sequence classification problem to a text classification one (I see this recommendation often in replies from the spaCy/prodigy team).

However, Edward uses Benepar for constituency parsing, but this library seems abandoned (not touched in 4 years), so I'm very reluctant to use it. Are there other alternatives for "smarter" ways of splitting sentences?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

SpanCategorizer not learning (either tok2vec or transformer based) for aspect extraction task #13780

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

SpanCategorizer not learning (either tok2vec or transformer based) for aspect extraction task #13780

Uh oh!

Uh oh!

elspanishgeek Mar 27, 2025

Context

Dataset

What I've Tried

Questions

Config

Replies: 1 comment

Uh oh!

Uh oh!

elspanishgeek Mar 28, 2025 Author

Making SpanCat work

Test NER

Convert to TextCat

elspanishgeek
Mar 27, 2025

elspanishgeek
Mar 28, 2025
Author