-
-
Notifications
You must be signed in to change notification settings - Fork 16
Description
This is a GSoC Project idea.
Difficulty/Size: Higher
The goal is to create a way for the MessageFormatter in ICU4C to leverage the Unicode Inflection project for creating grammatically correct sentences. These are currently 2 independent projects, and Unicode Inflection is only available with C and C++ API at this time. Unicode Inflection supports all of the use cases in the expected outcomes, but MessageFormatter has no way to use such functionality at this time.
Below is background material for the Unicode Inflection concepts with 3 different syntaxes that are not the MessageFormat 2 syntax. Unicode Inflection supports all of the use cases in the expected outcomes.
- UTW 2023 Automatic Grammar Agreement in Message Formatting
- S12T1 Authoring Grammatically Correct Conversational Templates for Siri
- Let's Come To An Agreement About Our Words :: IMUG 2017.02.16
Here is some additional background information between these projects.
Expected Outcomes
- A minimum viable integration can generate these messages depending on the grammatical properties of the object being inserted into the sentence.
- English
- The {object} is on
- The {object} are on
- French
- La {object} est allumée
- Les {object} sont allumées
- L'{object} est allumé
- Le {object} est allumé
- Les {object} sont allumés
- English
- Bonus integrations include the following topics
- Support quantities. E.g. 1 foot/2 feet
- Support lists. E.g. An object, and a table
- Support pronouns, especially in Spanish and Arabic. E.g. Here is {pronoun} location.
- Support spoken text. E.g. one foot/two feet.
- Support inflecting words. E.g. mouse + plural → mice
- Support SemanticConcept to allow custom inflections.
- Support inflection alternative when the word being inserted into the sentence is not in the lexical dictionary. E.g. 😀 isn’t in the lexical dictionary. So you may want to use “L(a|e) 😀 est allumé(e)” in French.
Skills
- Required: C/C++
- Required: Ability to use cmake for building the code
- Preferred: Love of languages.