Skip to content

Features

Jenia Kim edited this page Jul 29, 2021 · 1 revision

Features

Feature names in pkl Description (Szarvas et al. 2012)
lemma_-2
lemma_-1
lemma_0
lemma_1
lemma_2
Stems of the current token and the tokens in a window of size 2 (two tokens to the left and right).
pattern_prefix
pattern_-1
pattern_0
pattern_1
Surface patterns of the current token and the tokens in a window of size one (one token to the left and right). This feature describes the capitalization and other orthographic features.
prefix_3
prefix_4
prefix_5
suffix_3
suffix_4
suffix_5
Prefixes and suffixes of the current token, 3-5 characters long.
pos_-2
pos_-1
pos_0
pos_1
pos_2
Part-of-speech (POS) tags of the current token and the tokens in a window of size 2.
chunk_-2
chunk_-1
chunk_0
chunk_1
chunk_2
Syntactic chunks of the current token and the tokens in a window of size 2.

* Szarvas et al. (2012) use an additional type of features: concatenated stem, POS, and chunk labels. This feature was not used in the current system.

Clone this wiki locally