-
Notifications
You must be signed in to change notification settings - Fork 0
Features
Jenia Kim edited this page Jul 29, 2021
·
1 revision
Feature names in pkl | Description (Szarvas et al. 2012) |
---|---|
lemma_-2 lemma_-1 lemma_0 lemma_1 lemma_2 |
Stems of the current token and the tokens in a window of size 2 (two tokens to the left and right). |
pattern_prefix pattern_-1 pattern_0 pattern_1 |
Surface patterns of the current token and the tokens in a window of size one (one token to the left and right). This feature describes the capitalization and other orthographic features. |
prefix_3 prefix_4 prefix_5 suffix_3 suffix_4 suffix_5 |
Prefixes and suffixes of the current token, 3-5 characters long. |
pos_-2 pos_-1 pos_0 pos_1 pos_2 |
Part-of-speech (POS) tags of the current token and the tokens in a window of size 2. |
chunk_-2 chunk_-1 chunk_0 chunk_1 chunk_2 |
Syntactic chunks of the current token and the tokens in a window of size 2. |
* Szarvas et al. (2012) use an additional type of features: concatenated stem, POS, and chunk labels. This feature was not used in the current system.