-
Notifications
You must be signed in to change notification settings - Fork 4
MatrixChoicesRfc
Work in Progress!
In this document, we provide an outline for an overhaul of the choices system for the Grammar Matrix. The primary changes are moving to a standard serialization format, the introduction of a choices schema, and the decoupling of choices and questionnaire display (matrixdef). We propose using TOML as an open source, non-proprietary, generally available data storage format. We propose providing a schema for choices files. We propose decoupling choices serialization and deserialization from matrixdef and matrix.cgi. These changes will make development easier and help enable connecting the Grammar Matrix to other projects, both as a component and a target.
The Grammar Matrix is composed of several components, including but not limited to the following:
-
matrix.tdl
-
the customization system
-
the questionnaire
-
the regression tests
A choices file is a serialized data file containing both 1) the answers to questions in the questionnaire (as well as annotated data such as lexical entries) and 2) information mapping choices to the matrixdef file that participated in generating it. (2) includes, but is probably not limited to:
-
the sections and section names of the matrixdef file
-
the choice names and (optionally) valid choices of the matrixdef file
-
the ordering of the above
Currently, choices are written in a proprietary storage format (DSL).
The code implementing choices is defined in three locations:
-
choices.py (the main data structure definitions, API)
-
deffile.py (choices deserialization)
-
matrix.cgi (choices serialization)
The goals for changing this set up include the following:
-
Ease of human reading: the current choices file format is generally human readable, and this is an important consideration of any new changes. However, indexing and non-obvious nesting can make choices difficult for newcomers to understand.
-
Ease of human editing: along with reading, being able to identify and edit a choice without using the customization system is a critical requirement, for use with matrix.py, debugging, and development.
-
Ease of creation: it is currently difficult to write a choices file from scratch. One must read the current matrixdef file to understand available sections, choices, and options. Ideally, one would be able to write choices with very few dependencies.
-
Computer serialization and deserialization: choices are currently stored in a proprietary format, so reading and writing them requires implementing a significant amount of code.
-
Make development easier: currently, changing anything to do with choices probably requires changes in several locations and is brittle.
-
Choices as a component: a modularized choices format could be used as a data format in other projects, targeting the grammar matrix or not.
-
Matrix as a target: modularizing the choices format could help others develop systems that use the matrix customization system (e.g. AGGREGATION).
Using a standard serialization format allows the Grammar Matrix code, other code, and humans to more easily read and write choices files. This would also reduce the amount of code in the grammar matrix repo, making development easier.
Several candidates were considered for the serialization format: JSON, YAML, StrictYAML, and TOML.
- JSON: generally considered hard to read, hard to edit, and hard to create.
- YAML:
- StrictYAML:
- TOML:
Home | Forum | Discussions | Events