Skip to content

Integrate Unicode Inflection into Unicode Message Format #87

@grhoten

Description

@grhoten

This is a GSoC Project idea.

Difficulty/Size: Higher

The goal is to create a way for the MessageFormatter in ICU4C to leverage the Unicode Inflection project for creating grammatically correct sentences. These are currently 2 independent projects, and Unicode Inflection is only available with C and C++ API at this time. Unicode Inflection supports all of the use cases in the expected outcomes, but MessageFormatter has no way to use such functionality at this time.

Below is background material for the Unicode Inflection concepts with 3 different syntaxes that are not the MessageFormat 2 syntax. Unicode Inflection supports all of the use cases in the expected outcomes.

Here is some additional background information between these projects.

Expected Outcomes

  1. A minimum viable integration can generate these messages depending on the grammatical properties of the object being inserted into the sentence.
    1. English
      1. The {object} is on
      2. The {object} are on
    2. French
      1. La {object} est allumée
      2. Les {object} sont allumées
      3. L'{object} est allumé
      4. Le {object} est allumé
      5. Les {object} sont allumés
  2. Bonus integrations include the following topics
    1. Support quantities. E.g. 1 foot/2 feet
    2. Support lists. E.g. An object, and a table
    3. Support pronouns, especially in Spanish and Arabic. E.g. Here is {pronoun} location.
    4. Support spoken text. E.g. one foot/two feet.
    5. Support inflecting words. E.g. mouse + plural → mice
    6. Support SemanticConcept to allow custom inflections.
    7. Support inflection alternative when the word being inserted into the sentence is not in the lexical dictionary. E.g. 😀 isn’t in the lexical dictionary. So you may want to use “L(a|e) 😀 est allumé(e)” in French.

Skills

  • Required: C/C++
  • Required: Ability to use cmake for building the code
  • Preferred: Love of languages.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions