Skip to content

hr-files

Compare
Choose a tag to compare
@csukuangfj csukuangfj released this 27 Apr 06:43
· 60 commits to master since this release
e328002

replace.fst is generated from
https://colab.research.google.com/drive/1jEaS3s8FbRJIcVQJv2EQx19EM_mnuARi?usp=sharing


If you don't have access to the colab notebook, here is the code for generating replace.fst:

import pynini
from pynini.lib import utf8, byte
from pynini import cdrewrite

sigma = utf8.VALID_UTF8_CHAR.star

rule1 = pynini.cross("dan1ni2er3bo1wei2", "丹尼尔·波维")
rule10 = pynini.cross("dan1ni2er3bo1wei4", "丹尼尔·波维")
rule2 = pynini.cross('dou4dou4', '豆豆')
rule3 = pynini.cross('cheng2cheng2', '橙橙')
rule30 = pynini.cross('chen2chen2', '橙橙')
rule4 = pynini.cross('qiao2qiao2', '峤峤')
rule5 = pynini.cross('qiu2qiu2', '球球')
rule6 = pynini.cross('lin2mei3li4', '林美丽')
rule7 = pynini.cross('guo3guo3', '果果')
rule8 = pynini.cross('miao2miao2', '苗苗')
rule9 = pynini.cross('xuan2jie4', '玄戒')
rule10 = pynini.cross('xuan2jie4xin1pian1', '玄戒芯片')
rule11 = pynini.cross('xuan2jie4xing1pian1', '玄戒芯片')
rule12 = pynini.cross('xuan2jie4xin1pian1', '玄戒芯片')
rule13 = pynini.cross('xuan2jie4xing1pian1', '玄戒芯片')


rule = (rule1 | rule10 | rule2 | rule3 | rule30 | rule4 | rule5 | rule6 | rule7 | rule8 | rule9 | rule10 | rule11 | rule12 | rule13).optimize()
rule = cdrewrite(rule, "", "", sigma)

rule.write('replace.fst')

Note that you need to use

pip install --only-binary :all: pynini

to install pynini