If you input texts of Aozora Bunko into this extractor, phoneme balance sentences will be extracted. This extractor counts the two-phoneme and three-phoneme chains contained in the texts and outputs a few sentences that cover the types of phoneme chains.
- Linux
- Python 3
- jumanpp
- pyopenjtalk
Please download text files from Aozora Bunko.
To run this extractor, please store the downloaded texts in "aosora" directory.
cd /path/to/phoneme-balance-sentence-extractor/
mkdir aosora
mkdir aosora_utf8
When you run the script, balance sentences will be output to "extracted_sentences.txt".
sh aosora_analysis.sh