Skip to content

在中文中比如比分4-3,这一块如何处理 #38

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ChaoII opened this issue May 21, 2025 · 1 comment
Open

在中文中比如比分4-3,这一块如何处理 #38

ChaoII opened this issue May 21, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@ChaoII
Copy link

ChaoII commented May 21, 2025

比如这一段话,
"北京时间5月19日多哈世乒赛,王楚钦势如破竹4-0剃光头,零封巴西小将速胜晋级;男单10号种子邱党鏖战七局爆冷被淘汰,从0-3追到3-3,只是最终还是无功而返,下面看看各场对决的简述。王楚钦延续火热的竞技状态,比赛上来连赢七分势不可挡,强力进攻打得对手无可奈何,毫无疑问是做好战术准备,首局几乎没给任何机会11-3速胜。";
这里的比分均发音为 “四减零” “零减三”等等

@apinge
Copy link
Owner

apinge commented May 21, 2025

我觉得前处理时候直接去掉-比较好
原因是-太容易引起歧义了,可能是“负” “减” “到”,很难用简单的正则做规则
所以干脆去掉可能更合适。

我会试着复现一下问题 再从汉语预处理里去掉-

@apinge apinge added the bug Something isn't working label May 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants