-
Notifications
You must be signed in to change notification settings - Fork 450
update: rename bcp47
to lang
#1164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
NB this added 1-3 dependencies for iirc ~7MB. Wanted to just query the IANA reference data file and do the checks manually but this was slightly too much work. I will do penance in removing other dependencies later to offset this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
One item I think needs a little documentation is that detectors
changed bcp47
-> lang_spec
while probes
moved to lang
. I believe I understand the reasoning for this however it would be helpful to have some documentation to reference for reasoning.
and lang != "*" | ||
and self.bcp47 != lang | ||
and self.lang != lang |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note in current usage lang
passed here is now the detector lang_spec
. I don't think we need to change anything for this PR however in a future iteration the callers will need to ensure to only pass a single lang
from the spec list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this is the phenomenon described in bullet 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reasoning present in:
expected behavior expressed in:
reasoning added to:
|
bcp47
is a type; it describes format not functionrenamed to
lang
orlang_spec
for more descriptive, precise, intuitive codelearnings & positions taken here:
bcp47
->lang
.bcp47
->lang_spec
Attempt
language semantics are unclear.Attempt
s should be in one language, especially after unanimous decisions made during implementation of multilingual, but attempt bcp47 is occasionally populated from detector or probe bcp47. This PR takes the position that attempts are monolingual. This will be unravelled precisely when Turn+Conversation lands Migrate string output/input toTurn
objects #1089 , but it's something to watch for. there are a few assignments left, e.g. indetectors.base.Detector.detect
, that violate this.langcodes
provides some normalisation functions that might be useful in the language code format mappingResolves #1139