Skip to content

Commit 1bcd935

Browse files
committed
Backout Unicode bare keys
This backs out the unicode bare keys from #891. This does *not* mean we can't include it in a future 1.2 (or 1.3, or whatever); just that right now there doesn't seem to be a clear consensus regarding to normalisation and which characters to include. It's already the most discussed single issue in the history of TOML. I kind of hate doing this as it seems a step backwards; in principle I think we *should* have this so I'm not against the idea of the feature as such, but things seem to be at a bit of a stalemate right now, and this will allow TOML to move forward on other fronts. It hasn't come up *that* often; the issue (#687) wasn't filed until 2019, and has only 11 upvotes. Other than that, the issue was raised only once before in 2015 as far as I can find (#337). I also can't really find anyone asking for it in any of the HN threads on TOML. Reverting this means we can go forward releasing TOML 1.1, giving people access to the much more frequently requested relaxing of inline tables (#516, with 122 upvotes, and has come up on HN as well) and some other more minor things (e.g. `\e` has 12 upvotes in #715). Basically, a lot more people are waiting for this, and all things considered this seems a better path forward for now, unless someone comes up with a proposal which addresses all issues (I tried and thus far failed). I proposed this over here a few months ago, and the responses didn't seem too hostile to the idea: #966 (comment)
1 parent 48c3bed commit 1bcd935

File tree

2 files changed

+8
-31
lines changed

2 files changed

+8
-31
lines changed

toml.abnf

Lines changed: 1 addition & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -50,19 +50,7 @@ key = simple-key / dotted-key
5050
val = string / boolean / array / inline-table / date-time / float / integer
5151

5252
simple-key = quoted-key / unquoted-key
53-
54-
;; Unquoted key
55-
56-
unquoted-key = 1*unquoted-key-char
57-
unquoted-key-char = ALPHA / DIGIT / %x2D / %x5F ; a-z A-Z 0-9 - _
58-
unquoted-key-char =/ %xB2 / %xB3 / %xB9 / %xBC-BE ; superscript digits, fractions
59-
unquoted-key-char =/ %xC0-D6 / %xD8-F6 / %xF8-37D ; non-symbol chars in Latin block
60-
unquoted-key-char =/ %x37F-1FFF ; exclude GREEK QUESTION MARK, which is basically a semi-colon
61-
unquoted-key-char =/ %x200C-200D / %x203F-2040 ; from General Punctuation Block, include the two tie symbols and ZWNJ, ZWJ
62-
unquoted-key-char =/ %x2070-218F / %x2460-24FF ; include super-/subscripts, letterlike/numberlike forms, enclosed alphanumerics
63-
unquoted-key-char =/ %x2C00-2FEF / %x3001-D7FF ; skip arrows, math, box drawing etc, skip 2FF0-3000 ideographic up/down markers and spaces
64-
unquoted-key-char =/ %xF900-FDCF / %xFDF0-FFFD ; skip D800-DFFF surrogate block, E000-F8FF Private Use area, FDD0-FDEF intended for process-internal use (unicode)
65-
unquoted-key-char =/ %x10000-EFFFF ; all chars outside BMP range, excluding Private Use planes (F0000-10FFFF)
53+
unquoted-key = 1*( ALPHA / DIGIT / %x2D / %x5F ) ; A-Z / a-z / 0-9 / - / _
6654

6755
;; Quoted and dotted key
6856

toml.md

Lines changed: 7 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -106,11 +106,9 @@ first = "Tom" last = "Preston-Werner" # INVALID
106106

107107
A key may be either bare, quoted, or dotted.
108108

109-
**Bare keys** may contain any letter-like or number-like Unicode character from
110-
any Unicode script, as well as ASCII digits, dashes and underscores.
111-
Punctuation, spaces, arrows, box drawing and private use characters are not
112-
allowed. Note that bare keys are allowed to be composed of only ASCII digits,
113-
e.g. 1234, but are always interpreted as strings.
109+
**Bare keys** may only contain ASCII letters, ASCII digits, underscores, and
110+
dashes (`A-Za-z0-9_-`). Note that bare keys are allowed to be composed of only
111+
ASCII digits, e.g. `1234`, but are always interpreted as strings.
114112

115113
ℹ️ The exact ranges of allowed code points can be found in the
116114
[ABNF grammar file][abnf].
@@ -120,23 +118,18 @@ key = "value"
120118
bare_key = "value"
121119
bare-key = "value"
122120
1234 = "value"
123-
Fuß = "value"
124-
😂 = "value"
125-
汉语大字典 = "value"
126-
辭源 = "value"
127-
பெண்டிரேம் = "value"
128121
```
129122

130123
**Quoted keys** follow the exact same rules as either basic strings or literal
131-
strings and allow you to use any Unicode character in a key name, including
132-
spaces. Best practice is to use bare keys except when absolutely necessary.
124+
strings and allow you to use a much broader set of key names. Best practice is
125+
to use bare keys except when absolutely necessary.
133126

134127
```toml
135128
"127.0.0.1" = "value"
136129
"character encoding" = "value"
130+
"ʎǝʞ" = "value"
131+
'key2' = "value"
137132
'quoted "value"' = "value"
138-
"╠═╣" = "value"
139-
"⋰∫∬∭⋱" = "value"
140133
```
141134

142135
A bare key must be non-empty, but an empty quoted key is allowed (though
@@ -157,7 +150,6 @@ name = "Orange"
157150
physical.color = "orange"
158151
physical.shape = "round"
159152
site."google.com" = true
160-
பெண்.டிரேம் = "we are women"
161153
```
162154

163155
In JSON land, that would give you the following structure:
@@ -171,9 +163,6 @@ In JSON land, that would give you the following structure:
171163
},
172164
"site": {
173165
"google.com": true
174-
},
175-
"பெண்": {
176-
"டிரேம்": "we are women"
177166
}
178167
}
179168
```

0 commit comments

Comments
 (0)