Skip to content

Commit 3201e86

Browse files
committed
Backout Unicode bare keys
This backs out the unicode bare keys from #891. This does *not* mean we can't include it in a future 1.2 (or 1.3, or whatever); just that right now there doesn't seem to be a clear consensus regarding to normalisation and which characters to include. It's already the most discussed single issue in the history of TOML. I kind of hate doing this as it seems a step backwards; in principle I think we *should* have this so I'm not against the idea of the feature as such, but things seem to be at a bit of a stalemate right now, and this will allow TOML to move forward on other fronts. It hasn't come up *that* often; the issue (#687) wasn't filed until 2019, and has only 11 upvotes. Other than that, the issue was raised only once before in 2015 as far as I can find (#337). I also can't really find anyone asking for it in any of the HN threads on TOML. Reverting this means we can go forward releasing TOML 1.1, giving people access to the much more frequently requested relaxing of inline tables (#516, with 122 upvotes, and has come up on HN as well) and some other more minor things (e.g. `\e` has 12 upvotes in #715). Basically, a lot more people are waiting for this, and all things considered this seems a better path forward for now, unless someone comes up with a proposal which addresses all issues (I tried and thus far failed). I proposed this over here a few months ago, and the responses didn't seem too hostile to the idea: #966 (comment)
1 parent 4cc0f97 commit 3201e86

File tree

3 files changed

+8
-32
lines changed

3 files changed

+8
-32
lines changed

CHANGELOG.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,6 @@
1010
- Add new `\e` shorthand for the escape character.
1111
- Add \x00 notation to basic strings.
1212
- Seconds in Date-Time and Time values are now optional.
13-
- Allow non-English scripts in unquoted (bare) keys
1413
- Clarify newline normalization in multi-line literal strings.
1514

1615
## 1.0.0 / 2021-01-11

toml.abnf

Lines changed: 1 addition & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -49,19 +49,7 @@ key = simple-key / dotted-key
4949
val = string / boolean / array / inline-table / date-time / float / integer
5050

5151
simple-key = quoted-key / unquoted-key
52-
53-
;; Unquoted key
54-
55-
unquoted-key = 1*unquoted-key-char
56-
unquoted-key-char = ALPHA / DIGIT / %x2D / %x5F ; a-z A-Z 0-9 - _
57-
unquoted-key-char =/ %xB2 / %xB3 / %xB9 / %xBC-BE ; superscript digits, fractions
58-
unquoted-key-char =/ %xC0-D6 / %xD8-F6 / %xF8-37D ; non-symbol chars in Latin block
59-
unquoted-key-char =/ %x37F-1FFF ; exclude GREEK QUESTION MARK, which is basically a semi-colon
60-
unquoted-key-char =/ %x200C-200D / %x203F-2040 ; from General Punctuation Block, include the two tie symbols and ZWNJ, ZWJ
61-
unquoted-key-char =/ %x2070-218F / %x2460-24FF ; include super-/subscripts, letterlike/numberlike forms, enclosed alphanumerics
62-
unquoted-key-char =/ %x2C00-2FEF / %x3001-D7FF ; skip arrows, math, box drawing etc, skip 2FF0-3000 ideographic up/down markers and spaces
63-
unquoted-key-char =/ %xF900-FDCF / %xFDF0-FFFD ; skip D800-DFFF surrogate block, E000-F8FF Private Use area, FDD0-FDEF intended for process-internal use (unicode)
64-
unquoted-key-char =/ %x10000-EFFFF ; all chars outside BMP range, excluding Private Use planes (F0000-10FFFF)
52+
unquoted-key = 1*( ALPHA / DIGIT / %x2D / %x5F ) ; A-Z / a-z / 0-9 / - / _
6553

6654
;; Quoted and dotted key
6755

toml.md

Lines changed: 7 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -103,11 +103,9 @@ first = "Tom" last = "Preston-Werner" # INVALID
103103

104104
A key may be either bare, quoted, or dotted.
105105

106-
**Bare keys** may contain any letter-like or number-like Unicode character from
107-
any Unicode script, as well as ASCII digits, dashes and underscores.
108-
Punctuation, spaces, arrows, box drawing and private use characters are not
109-
allowed. Note that bare keys are allowed to be composed of only ASCII digits,
110-
e.g. 1234, but are always interpreted as strings.
106+
**Bare keys** may only contain ASCII letters, ASCII digits, underscores, and
107+
dashes (`A-Za-z0-9_-`). Note that bare keys are allowed to be composed of only
108+
ASCII digits, e.g. `1234`, but are always interpreted as strings.
111109

112110
ℹ️ The exact ranges of allowed code points can be found in the
113111
[ABNF grammar file][abnf].
@@ -117,23 +115,18 @@ key = "value"
117115
bare_key = "value"
118116
bare-key = "value"
119117
1234 = "value"
120-
Fuß = "value"
121-
😂 = "value"
122-
汉语大字典 = "value"
123-
辭源 = "value"
124-
பெண்டிரேம் = "value"
125118
```
126119

127120
**Quoted keys** follow the exact same rules as either basic strings or literal
128-
strings and allow you to use any Unicode character in a key name, including
129-
spaces. Best practice is to use bare keys except when absolutely necessary.
121+
strings and allow you to use a much broader set of key names. Best practice is
122+
to use bare keys except when absolutely necessary.
130123

131124
```toml
132125
"127.0.0.1" = "value"
133126
"character encoding" = "value"
127+
"ʎǝʞ" = "value"
128+
'key2' = "value"
134129
'quoted "value"' = "value"
135-
"╠═╣" = "value"
136-
"⋰∫∬∭⋱" = "value"
137130
```
138131

139132
A bare key must be non-empty, but an empty quoted key is allowed (though
@@ -154,7 +147,6 @@ name = "Orange"
154147
physical.color = "orange"
155148
physical.shape = "round"
156149
site."google.com" = true
157-
பெண்.டிரேம் = "we are women"
158150
```
159151

160152
In JSON land, that would give you the following structure:
@@ -168,9 +160,6 @@ In JSON land, that would give you the following structure:
168160
},
169161
"site": {
170162
"google.com": true
171-
},
172-
"பெண்": {
173-
"டிரேம்": "we are women"
174163
}
175164
}
176165
```

0 commit comments

Comments
 (0)