To review amharicNormalizer.py


Hi there,

I was reviewing the code in the amharicNormalizer.py file and noticed that there's no code to handle a specific scenario like በልቱዋል or  በልቱአል to  በልቷል . I believe we should add a code snippet to address this issue. Specifically, I propose that we include the following script code to handle this problem:

#Normalizing words with Labialized Amharic characters such as በልቱዋል or  በልቱአል to  በልቷል  

        norm=re.sub('(ሉ[ዋአ])','ሏ',norm)
        norm=re.sub('(ሙ[ዋአ])','ሟ',norm)
        norm=re.sub('(ቱ[ዋአ])','ቷ',norm)
        norm=re.sub('(ሩ[ዋአ])','ሯ',norm)
        norm=re.sub('(ሱ[ዋአ])','ሷ',norm)
        norm=re.sub('(ሹ[ዋአ])','ሿ',rep31)
        norm=re.sub('(ቁ[ዋአ])','ቋ',norm)
        norm=re.sub('(ቡ[ዋአ])','ቧ',norm)
        norm=re.sub('(ቹ[ዋአ])','ቿ',norm)
        norm=re.sub('(ሁ[ዋአ])','ኋ',norm)
        norm=re.sub('(ኑ[ዋአ])','ኗ',norm)
        norm=re.sub('(ኙ[ዋአ])','ኟ',norm)
        norm=re.sub('(ኩ[ዋአ])','ኳ',norm)
        norm=re.sub('(ዙ[ዋአ])','ዟ',norm)
        norm=re.sub('(ጉ[ዋአ])','ጓ',norm)
        norm=re.sub('(ደ[ዋአ])','ዷ',norm)
        norm=re.sub('(ጡ[ዋአ])','ጧ',norm)
        norm=re.sub('(ጩ[ዋአ])','ጯ',norm)
        norm=re.sub('(ጹ[ዋአ])','ጿ',norm)
        norm=re.sub('(ፉ[ዋአ])','ፏ',norm)
        norm=re.sub('[ቊ]','ቁ',rep46) #ቁ can be written as ቊ
        norm=re.sub('[ኵ]','ኩ',norm) #ኩ can be also written as ኵ  


This should help ensure that the script runs smoothly and prevents errors. 

Regards, Melese.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

To review amharicNormalizer.py #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

To review amharicNormalizer.py #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions