-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Four example words have been selected to provide the *e vs *ä distinction found in the manuscript of the monolingual Erzya dictionary by Kuzʹma Abramov.
In the lexc file we have:
пей+N:пӓй
сэдь+N:сӓдь
седей+N:сьӓдей
эрзя+N:ӓрзя
‹ӓ› has been declared in twolc
the filter: ‹remove-diaereses-enhancement.regex› looks like this:
[[ Ь | ь ] -> 0 || _ [ ӓ | Ӓ ] ,, ӓ -> е || [ ь | Ь ] _ ]
.o.
ӓ -> е || [ в | В | б | Б | г | Г | ж | Ж | к | К | м | М | п | П | ф | Ф | х | Х | ч | Ч | ш | Ш | щ | Щ ] _
.o.
ӓ -> э || [ д | Д | з | З | л | Л | н | Н | р | Р | с | С | т | Т | ц | Ц ] _
.o.
ӓ -> э || [ .#. | %- ] _ ;
So, there are a number of things going on in one place.
Line 1 removes underlying soft sign preceding underlying ӓ and simultaneously replaces underlying ӓ with е. (failure)
Line 2 replaces underlying ‹ӓ› with ‹е›. (partial success)
Line 3 replaces underlying ‹ӓ› with ‹э› following specific consonants. (partial success)
Line 4 replaces underlying ‹ӓ› with ‹э› word-initially. (partial success)
The script remove-diaereses-enhancement.hfst
is called in
lang-myv/src/fst/Makefile.am and lang-myv/src/fst/filters/Makefile.am
The desired result for the four words give above would be:
Analysis
lang-myv jackrueter$ hfst-lookup src/fst/analyser-gt-norm.hfstol
> пей
пей пей+N+Sg+Nom+Indef 0,000000
> сэдь
сэдь сэдь+N+Sg+Nom+Indef 0,000000
> седей
седей седей+N+Sg+Nom+Indef 0,000000
> эрзя
эрзя эрзя+N+Sg+Nom+Indef 0,000000
Dict-Generation:
lang-myv jackrueter$ hfst-lookup src/fst/generator-dict-gt-norm.hfst
hfst-lookup: warning: It is not possible to perform fast lookups with foma format automata.
Using HFST basic transducer format and performing slow lookups
> пей+N+Sg+Nom+Indef
пей+N+Sg+Nom+Indef пӓй 0,000000
> сэдь+N+Sg+Nom+Indef
сэдь+N+Sg+Nom+Indef сӓдь 0,000000
> седей+N+Sg+Nom+Indef
седей+N+Sg+Nom+Indef сьӓдей 0,000000
> эрзя+N+Sg+Nom+Indef
эрзя+N+Sg+Nom+Indef ӓрзя 0,000000
Generation:
lang-myv jackrueter$ hfst-lookup src/fst/generator-gt-norm.hfstol
> пей+N+Sg+Nom+Indef
пей+N+Sg+Nom+Indef пей 0,000000
> сэдь+N+Sg+Nom+Indef
сэдь+N+Sg+Nom+Indef сэдь 0,000000
> седей+N+Sg+Nom+Indef
седей+N+Sg+Nom+Indef седей 0,000000
> эрзя+N+Sg+Nom+Indef
эрзя+N+Sg+Nom+Indef эрзя 0,000000
Instead, I get:
Analysis
lang-myv jackrueter$ hfst-lookup src/fst/analyser-gt-norm.hfstol
> пей
пей пей+? inf
> сэдь
сэдь сэдь+? inf
> седей
седей седей+? inf
> эрзя
эрзя эрзя+? inf
Dict-Generation:
lang-myv jackrueter$ hfst-lookup src/fst/generator-dict-gt-norm.hfst
hfst-lookup: warning: It is not possible to perform fast lookups with foma format automata.
Using HFST basic transducer format and performing slow lookups
> пей+N+Sg+Nom+Indef
пей+N+Sg+Nom+Indef пӓй 0,000000
> сэдь+N+Sg+Nom+Indef
сэдь+N+Sg+Nom+Indef сӓдь 0,000000
> седей+N+Sg+Nom+Indef
седей+N+Sg+Nom+Indef седей+N+Sg+Nom+Indef+? inf
> эрзя+N+Sg+Nom+Indef
эрзя+N+Sg+Nom+Indef ӓрзя 0,000000
Generation:
lang-myv jackrueter$ hfst-lookup src/fst/generator-gt-norm.hfstol
> пей+N+Sg+Nom+Indef
пей+N+Sg+Nom+Indef пей 0,000000
> сэдь+N+Sg+Nom+Indef
сэдь+N+Sg+Nom+Indef сэдь 0,000000
> седей+N+Sg+Nom+Indef
седей+N+Sg+Nom+Indef седей+N+Sg+Nom+Indef+? inf
> эрзя+N+Sg+Nom+Indef
эрзя+N+Sg+Nom+Indef ӓрзя 0,000000