Skip to content

Attempt at establishing distinction between -dict-gt-norm- and -gt-norm fails #4

@rueter

Description

@rueter

Four example words have been selected to provide the *e vs *ä distinction found in the manuscript of the monolingual Erzya dictionary by Kuzʹma Abramov.
In the lexc file we have:

пей+N:пӓй
сэдь+N:сӓдь
седей+N:сьӓдей
эрзя+N:ӓрзя

‹ӓ› has been declared in twolc

the filter: ‹remove-diaereses-enhancement.regex› looks like this:

[[ Ь | ь ] -> 0 ||  _ [ ӓ | Ӓ ] ,, ӓ -> е || [ ь | Ь ]  _ ]   
.o.
ӓ -> е || [ в | В | б | Б | г | Г | ж | Ж | к | К | м | М | п | П | ф | Ф | х | Х | ч | Ч | ш | Ш | щ | Щ ] _ 
.o.
ӓ -> э || [ д | Д | з | З | л | Л | н | Н | р | Р | с | С | т | Т | ц | Ц ] _	
.o.
ӓ -> э || [ .#. | %- ] _ ;

So, there are a number of things going on in one place.
Line 1 removes underlying soft sign preceding underlying ӓ and simultaneously replaces underlying ӓ with е. (failure)
Line 2 replaces underlying ‹ӓ› with ‹е›. (partial success)
Line 3 replaces underlying ‹ӓ› with ‹э› following specific consonants. (partial success)
Line 4 replaces underlying ‹ӓ› with ‹э› word-initially. (partial success)

The script remove-diaereses-enhancement.hfst is called in
lang-myv/src/fst/Makefile.am and lang-myv/src/fst/filters/Makefile.am

The desired result for the four words give above would be:
Analysis

lang-myv jackrueter$ hfst-lookup src/fst/analyser-gt-norm.hfstol 
> пей
пей	пей+N+Sg+Nom+Indef	0,000000

> сэдь
сэдь	сэдь+N+Sg+Nom+Indef	0,000000

> седей
седей	седей+N+Sg+Nom+Indef	0,000000

> эрзя
эрзя	эрзя+N+Sg+Nom+Indef	0,000000

Dict-Generation:

lang-myv jackrueter$ hfst-lookup src/fst/generator-dict-gt-norm.hfst 
hfst-lookup: warning: It is not possible to perform fast lookups with foma format automata.
Using HFST basic transducer format and performing slow lookups
> пей+N+Sg+Nom+Indef
пей+N+Sg+Nom+Indef	пӓй	0,000000

> сэдь+N+Sg+Nom+Indef
сэдь+N+Sg+Nom+Indef	сӓдь	0,000000

> седей+N+Sg+Nom+Indef
седей+N+Sg+Nom+Indef	сьӓдей	0,000000

> эрзя+N+Sg+Nom+Indef
эрзя+N+Sg+Nom+Indef	ӓрзя	0,000000

Generation:

lang-myv jackrueter$ hfst-lookup src/fst/generator-gt-norm.hfstol
> пей+N+Sg+Nom+Indef
пей+N+Sg+Nom+Indef	пей	0,000000

> сэдь+N+Sg+Nom+Indef
сэдь+N+Sg+Nom+Indef	сэдь	0,000000

> седей+N+Sg+Nom+Indef
седей+N+Sg+Nom+Indef	седей	0,000000

> эрзя+N+Sg+Nom+Indef
эрзя+N+Sg+Nom+Indef	эрзя	0,000000

Instead, I get:
Analysis

lang-myv jackrueter$ hfst-lookup src/fst/analyser-gt-norm.hfstol 
> пей
пей	пей+?	inf

> сэдь
сэдь	сэдь+?	inf

> седей
седей	седей+?	inf

> эрзя
эрзя	эрзя+?	inf

Dict-Generation:

lang-myv jackrueter$ hfst-lookup src/fst/generator-dict-gt-norm.hfst 
hfst-lookup: warning: It is not possible to perform fast lookups with foma format automata.
Using HFST basic transducer format and performing slow lookups
> пей+N+Sg+Nom+Indef
пей+N+Sg+Nom+Indef	пӓй	0,000000

> сэдь+N+Sg+Nom+Indef
сэдь+N+Sg+Nom+Indef	сӓдь	0,000000

> седей+N+Sg+Nom+Indef
седей+N+Sg+Nom+Indef	седей+N+Sg+Nom+Indef+?	inf

> эрзя+N+Sg+Nom+Indef
эрзя+N+Sg+Nom+Indef	ӓрзя	0,000000

Generation:

lang-myv jackrueter$ hfst-lookup src/fst/generator-gt-norm.hfstol 
> пей+N+Sg+Nom+Indef
пей+N+Sg+Nom+Indef	пей	0,000000

> сэдь+N+Sg+Nom+Indef
сэдь+N+Sg+Nom+Indef	сэдь	0,000000

> седей+N+Sg+Nom+Indef
седей+N+Sg+Nom+Indef	седей+N+Sg+Nom+Indef+?	inf

> эрзя+N+Sg+Nom+Indef
эрзя+N+Sg+Nom+Indef	ӓрзя	0,000000

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions