Chinese words leftover detection. #437
AkamashiDesu
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
So, for anyone that use AI like Gemini to translate an epub from Chinese to English probably encounter a situation where there is some Chinese words leftover like this:
"I'm still a little confused. I suddenly became a駙馬(imperial consort)?"
or "Tong Xinya had already set off last night and hired dozens of guards from镖局to scout the road ahead...", ect...
Since it's kind of annoying to have to re-check every group of text for that error so I (ask ChatGPT to) create python script to detect those errors.
ChatGPT link: https://chatgpt.com/canvas/shared/68086c9e4fe48191a9be21cc155ac1d9
`
import zipfile
import re
import sys
import csv
Regular expression to match CJK Unified Ideographs (common Chinese characters)
CHINESE_CHAR_RE = re.compile(r"[\u4e00-\u9fff]")
Patterns to detect
tags
P_START_RE = re.compile(r"<p[\s>].*?>", re.IGNORECASE)
", re.IGNORECASE)P_END_RE = re.compile(r"
def find_chinese_in_epub(epub_path):
"""
Scan an EPUB file for Chinese characters inside
tags and return a list of results.
def write_results_to_csv(results, csv_path):
"""
Write the scan results to a CSV file for easier viewing.
if name == 'main':
if len(sys.argv) < 2:
print(f"Usage: {sys.argv[0]} <path_to_epub> [output_csv] (default: chinese_leftovers.csv)")
sys.exit(1)
`
Open Terminal and Run:
python detect_chinese_epub.py D:/Translate/detect_leftover/greenmanor.epub (Change it to where your script and epub are)
or specify a custom output filename:
python detect_chinese_epub.py D:/Translate/detect_leftover/greenmanor.epub my_results.csv
Beta Was this translation helpful? Give feedback.
All reactions