Burmese-Tools

burmese-tools is a Python library that mainly designed for transformation between unicode and zawgyi burmese lanuage script. Moreover, it also support syllable tokenization for burmese unicode and parital syllable tokenization for burmese zawgyi.

Features

Zawgyi to Unicode Conversion: Convert Zawgyi-encoded text to Unicode.
Unicode to Zawgyi Conversion: Transform Unicode text back to Zawgyi.
Unicode Syllable Tokenization: Tokenize Burmese Unicode text into syllables with customizable splitting.
Partial Zawgyi Syllable Tokenization: Tokenize Burmese Zawgyi text into syllables.

Installation

You can install this library from PyPI:

pip install burmese-tools

Usage

Importing the Library

from burmese_tools import tools

Convert Zawgyi to Unicode

text_zawgyi = "ကႏၲာရ"
converted_text = tools.zaw2uni(text_zawgyi)
print(converted_text)  # Output: ကန္တာရ

Convert Unicode to Zawgyi

text_unicode = "ကန္တာရ"
converted_text = tools.uni2zaw(text_unicode)
print(converted_text)  # Output: ကႏၲာ႐

Tokenize Unicode Text into Syllables

The uni_syllable function is a utility to tokenize Unicode text into syllable tokens, allowing for flexible splitting methods.

Features

Tokenizes Unicode text into syllables.
Provides two types of splitting:
- Type 1: Splits ဂန္ဓာရ into ['ဂ', 'န္ဓာ', 'ရ']. (default)
- Type 2: Splits ဂန္ဓာရ into ['ဂန္', 'ဓာ', 'ရ'].
Supports an optional transform to replace ္ with ် (applies only when type=2) d
- default = True

text = "ကန္တာရ"
tokens = tools.uni_syllable(text, type=1)
print(tokens)  # Output: ['က', 'န္တာ', 'ရ']

text = "ကန္တာရ"
tokens = tools.uni_syllable(text, type=2)
print(tokens)  # Output: ['ကန်', 'တာ', 'ရ']

text = "ကန္တာရ"
tokens = tools.uni_syllable(text, type=2, transform=False)
print(tokens)  # Output: ['ကန္', 'တာ', 'ရ']

Tokenize Zawgyi Text into Partial Syllables

text = "ကႏၲာရ"
tokens = tools.zaw_partial_syllable(text)
print(tokens)  # Output: ['က', 'ႏၲာ', 'ရ']  in unicode ['က', 'န္တာ', 'ရ']

Contributing

Contributions are welcome! Please follow these steps:

Fork the repository.
Create a new branch for your feature/bug fix.
Make your changes and test thoroughly.
Submit a pull request.

License

This library is licensed under the MIT License. Feel free to use, modify, and distribute it.

Acknowledgments

This library was developed to simplify Burmese text processing for developers and linguists. Special thanks to Sa Phyo Thu Thet, from Simbolo for his valuable guidance, kindness, and support in teaching me. His mentorship has been instrumental in shaping my understanding and skills in NLP. Most importantly, I have referred and inspired by Sa Phyo Thu Thet's Myanmar-nlp-tool as one of the cornerstones to complete this project.

Contributions and feedback from the community are also highly appreciated.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
build/lib/burmese_tools		build/lib/burmese_tools
burmese_tools.egg-info		burmese_tools.egg-info
burmese_tools		burmese_tools
dist		dist
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Burmese-Tools

Features

Installation

Usage

Importing the Library

Convert Zawgyi to Unicode

Convert Unicode to Zawgyi

Tokenize Unicode Text into Syllables

Tokenize Zawgyi Text into Partial Syllables

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

MinThihaTun3012/burmese_tools

Folders and files

Latest commit

History

Repository files navigation

Burmese-Tools

Features

Installation

Usage

Importing the Library

Convert Zawgyi to Unicode

Convert Unicode to Zawgyi

Tokenize Unicode Text into Syllables

Tokenize Zawgyi Text into Partial Syllables

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages