Skip to content

Commit 0f0d9d7

Browse files
authored
Merge pull request #3 from marcozac/perf-unmarshal
Improve `Unmarshal` performance
2 parents 0936c96 + 9844e1d commit 0f0d9d7

19 files changed

+2615
-143
lines changed

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,10 @@ go.work*
1616
# GoReleaser
1717
dist/
1818

19+
# Temporary files
20+
tmp/
21+
*.tmp
22+
1923
# IDE
2024
.idea/
2125
.vscode/*

README.md

Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
# jsonc - JSON with comments for Go
2+
3+
[![Go Doc](https://img.shields.io/badge/godoc-reference-blue.svg)](http://godoc.org/github.com/marcozac/go-jsonc)
4+
![License](https://img.shields.io/github/license/marcozac/go-jsonc?color=blue)
5+
[![CI](https://github.com/marcozac/go-jsonc/actions/workflows/ci.yml/badge.svg)](https://github.com/marcozac/go-jsonc/actions/workflows/ci.yml)
6+
[![codecov](https://codecov.io/gh/marcozac/go-jsonc/branch/main/graph/badge.svg?token=JYj7gCZauN)](https://codecov.io/gh/marcozac/go-jsonc)
7+
[![Go Report Card](https://goreportcard.com/badge/github.com/marcozac/go-jsonc)](https://goreportcard.com/report/github.com/marcozac/go-jsonc)
8+
9+
`jsonc` is a light and dependency-free package for working with JSON with comments data built on top of `encoding/json`.
10+
It allows to remove comments converting to valid JSON-encoded data and to unmarshal JSON with comments into Go values.
11+
12+
The dependencies listed in [go.mod](/go.mod) are only used for testing and benchmarking or to support [alternative libraries](#alternative-libraries).
13+
14+
## Features
15+
16+
- Full support for comment lines and block comments
17+
- Preserve the content of strings that contain comment characters
18+
- Sanitize JSON with comments data by removing comments
19+
- Unmarshal JSON with comments into Go values
20+
21+
## Installation
22+
23+
Install the `jsonc` package:
24+
25+
```bash
26+
go get github.com/marcozac/go-jsonc
27+
```
28+
29+
## Usage
30+
31+
### Sanitize - Remove comments from JSON data
32+
33+
`Sanitize` removes all comments from JSON data, returning valid JSON-encoded byte slice that is compatible with standard library's json.Unmarshal.
34+
35+
It works with comment lines and block comments anywhere in the JSONC data, preserving the content of strings that contain comment characters.
36+
37+
#### Example
38+
39+
```go
40+
package main
41+
42+
import (
43+
"encoding/json"
44+
45+
"github.com/marcozac/go-jsonc"
46+
)
47+
48+
func main() {
49+
invalidData := []byte(`{
50+
// a comment
51+
"foo": "bar" /* a comment in a weird place */,
52+
/*
53+
a block comment
54+
*/
55+
"hello": "world" // another comment
56+
}`)
57+
58+
// Remove comments from JSONC
59+
data, err := jsonc.Sanitize(invalidData)
60+
if err != nil {
61+
...
62+
}
63+
64+
var v struct{
65+
Foo string
66+
Hello string
67+
}
68+
69+
// Unmarshal using any other library
70+
if err := json.Unmarshal(data, &v); err != nil {
71+
...
72+
}
73+
}
74+
```
75+
76+
### Unmarshal - Parse JSON with comments into a Go value
77+
78+
`Unmarshal` replicates the behavior of the standard library's json.Unmarshal function, with the addition of support for comments.
79+
80+
It is optimized to avoid calling [`Sanitize`](#sanitize---remove-comments-from-json-data) unless it detects comments in the data.
81+
This avoids the overhead of removing comments when they are not present, improving performance on small data sets.
82+
83+
It first checks if the data contains comment characters as `//` or `/*` using [`HasCommentRunes`](https://pkg.go.dev/github.com/marcozac/go-jsonc#HasCommentRunes).
84+
If no comment characters are found, it directly unmarshals the data.
85+
86+
Only if comments are detected it calls [`Sanitize`](#sanitize---remove-comments-from-json-data) before unmarshaling to remove them.
87+
So, `Unmarshal` tries to skip unnecessary work when possible, but currently it is not possible to detect false positives as `//` or `/*` inside strings.
88+
89+
Since the comment detection is based on a simple rune check, it is not recommended to use `Unmarshal` on large data sets unless you are not sure whether they contain comments.
90+
Indeed, `HasCommentRunes` needs to checks every single byte before to return `false` and may drastically slow down the process.
91+
92+
In this case, it is more efficient to call [`Sanitize`](#sanitize---remove-comments-from-json-data) before to unmarshal the data.
93+
94+
#### Example
95+
96+
```go
97+
package main
98+
99+
import "github.com/marcozac/go-jsonc"
100+
101+
func main() {
102+
invalidData := []byte(`{
103+
// a comment
104+
"foo": "bar"
105+
}`)
106+
107+
var v struct{ Foo string }
108+
109+
err := jsonc.Unmarshal(invalidData, &v)
110+
if err != nil {
111+
...
112+
}
113+
}
114+
```
115+
116+
## Alternative libraries
117+
118+
By default, `jsonc` uses the standard library's `encoding/json` to unmarshal JSON data and has no external dependencies.
119+
120+
It is possible to use build tags to use alternative libraries instead of the standard library's `encoding/json`:
121+
122+
| Tag | Library |
123+
| ------------ | -------------------------------------------------------------------- |
124+
| none or both | standard library |
125+
| jsoniter | [`github.com/json-iterator/go`](https://github.com/json-iterator/go) |
126+
| go_json | [`github.com/goccy/go-json`](https://github.com/goccy/go-json) |
127+
128+
## Benchmarks
129+
130+
This library aims to have performance comparable to the standard library's `encoding/json`.
131+
Unfortunately, comments removal is not free and it is not possible to avoid the overhead of removing comments when they are present.
132+
133+
Currently `jsonc` performs worse than the standard library's `encoding/json` on small data sets about 27% on data with comments in strings and 16% on data without comments.
134+
On medium data sets, the performance gap is increased to about 30% on data with comments in strings and reduced to 12% on data without comments.
135+
136+
However, using one of the [alternative libraries](#alternative-libraries), it is possible to achieve better performance than the standard library's `encoding/json` even considering the overhead of removing comments.
137+
138+
See [benchmarks](/benchmarks) for the full results.
139+
140+
The benchmarks are run on a MacBook Pro (16-inch, 2021), Apple M1 Max, 32 GB RAM.
141+
142+
## Contributing
143+
144+
:heart: Contributions are ~~needed~~ welcome!
145+
146+
Please open an issue or submit a pull request if you would like to contribute.
147+
148+
To submit a pull request:
149+
150+
- Fork this repository
151+
- Create a new branch
152+
- Make changes and commit
153+
- Push to your fork and submit a pull request
154+
155+
## License
156+
157+
This project is licensed under the Apache 2.0 license. See [LICENSE](/LICENSE) file for details.

benchmark_uncommented_test.go

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
// Copyright 2023 Marco Zaccaro. All Rights Reserved.
2+
//
3+
// Licensed under the Apache License, Version 2.0 (the "License");
4+
// you may not use this file except in compliance with the License.
5+
// You may obtain a copy of the License at
6+
//
7+
// http://www.apache.org/licenses/LICENSE-2.0
8+
//
9+
// Unless required by applicable law or agreed to in writing, software
10+
// distributed under the License is distributed on an "AS IS" BASIS,
11+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
// See the License for the specific language governing permissions and
13+
// limitations under the License.
14+
15+
//go:build uncommented_test
16+
// +build uncommented_test
17+
18+
package jsonc
19+
20+
import (
21+
"testing"
22+
23+
"github.com/marcozac/go-jsonc/internal/json"
24+
"github.com/stretchr/testify/assert"
25+
"github.com/stretchr/testify/require"
26+
)
27+
28+
// This file does not contain real benchmarks, but it is used to compare the
29+
// performances over the standard functions on uncommented JSON data.
30+
31+
// Check standard json.Unmarshal (or jsoniter / go-json / ...) performances
32+
// with uncommented JSON data.
33+
func BenchmarkUnmarshal(b *testing.B) {
34+
b.Run("Small", func(b *testing.B) {
35+
b.Run("UnCommented", func(b *testing.B) {
36+
benchmarkUnmarshal(b, _smallUncommented, Small{})
37+
})
38+
b.Run("NoCommentRunes", func(b *testing.B) {
39+
benchmarkUnmarshal(b, _smallNoCommentRunes, SmallNoCommentRunes{})
40+
})
41+
})
42+
b.Run("Medium", func(b *testing.B) {
43+
b.Run("UnCommented", func(b *testing.B) {
44+
benchmarkUnmarshal(b, _mediumUncommented, Medium{})
45+
})
46+
b.Run("NoCommentRunes", func(b *testing.B) {
47+
benchmarkUnmarshal(b, _mediumNoCommentRunes, MediumNoCommentRunes{})
48+
})
49+
})
50+
}
51+
52+
func benchmarkUnmarshal[T DataType](b *testing.B, data []byte, dt T) {
53+
b.Helper()
54+
b.RunParallel(func(p *testing.PB) {
55+
for p.Next() {
56+
UnmarshalOK(b, data, dt)
57+
}
58+
})
59+
}
60+
61+
func UnmarshalOK[T DataType](t require.TestingT, data []byte, dt T) {
62+
j := dt
63+
assert.NoError(t, json.Unmarshal(data, &j), "unmarshal failed")
64+
FieldsValue(t, j)
65+
}

benchmarks/README.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# Benchmark results
2+
3+
The tables below show the performance of [`Unmarshal`](#unmarshal---parse-json-with-comments-into-a-go-value) compared to the standard library's `encoding/json` and other alternative libraries on small and medium data sets.
4+
5+
They are formatted as follows:
6+
7+
| Data set | s/op | B/op | allocs/op |
8+
| ------------- | ------------------------------------------- | ---- | --------- |
9+
| Set reference | result (Δ% on reference / reference result) | same | same |
10+
11+
See the files in this directory for the full report.
12+
13+
### Standard library
14+
15+
The tables below show the performance of [`Unmarshal`](#unmarshal---parse-json-with-comments-into-a-go-value) compared to the standard library's `encoding/json` on small and medium data sets.
16+
17+
| **Small data set** | s/op | B/op | allocs/op |
18+
| -------------------------------------------------------------------------------------- | ------------------------- | --------------------------- | ---------------------- |
19+
| [With comments](../testdata/small.json) | 2.536µ | 1.344Ki | 22.00 |
20+
| [Without comments](../testdata/small_uncommented.json) (comment characters in strings) | 2.425µ (+27.17% / 1.907µ) | 1.219Ki (+14.71% / 1.062Ki) | 22.00 (+4.76% / 21.00) |
21+
| [Without comment characters](../testdata/small_no_comment_runes.json) | 2.306µ (+16.11% / 1.986µ) | 1.062Ki (~% / 1.062Ki) | 21.00 (~% / 21.00) |
22+
23+
| **Medium data set** | s/op | B/op | allocs/op |
24+
| -------------------------------------------------------------------------------------- | ------------------------- | --------------------------- | ------------------------ |
25+
| [With comments](../testdata/small.json) | 301.2µ | 324.7Ki | 1.067k |
26+
| [Without comments](../testdata/small_uncommented.json) (comment characters in strings) | 202.3µ (+30.86% / 154.6µ) | 148.7Ki (+60.41% / 92.70Ki) | 1.067k (+0.09% / 1.066k) |
27+
| [Without comment characters](../testdata/small_no_comment_runes.json) | 170.6µ (+11.63% / 152.8µ) | 92.70Ki (~% / 92.70Ki) | 1.066k (~% / 1.066k) |
28+
29+
### With [`github.com/json-iterator/go`](https://github.com/json-iterator/go)
30+
31+
| **Small data set** | s/op | B/op | allocs/op |
32+
| -------------------------------------------------------------------------------------- | ------------------------- | ----------------------- | ---------------------- |
33+
| [With comments](../testdata/small.json) | 1.632µ | 944.0 | 14.00 |
34+
| [Without comments](../testdata/small_uncommented.json) (comment characters in strings) | 1.702µ (+11.94% / 1.521µ) | 816.0 (+24.39% / 656.0) | 14.00 (+7.69% / 13.00) |
35+
| [Without comment characters](../testdata/small_no_comment_runes.json) | 1.603µ (~% / 1.598µ) | 656.0 (~% / 656.0) | 12.00 (~% / 13.00) |
36+
37+
| **Medium data set** | s/op | B/op | allocs/op |
38+
| -------------------------------------------------------------------------------------- | ------------------------- | --------------------------- | ------------------------ |
39+
| [With comments](../testdata/small.json) | 245.0µ | 407.8Ki | 3.484k |
40+
| [Without comments](../testdata/small_uncommented.json) (comment characters in strings) | 142.4µ (+42.25% / 100.1µ) | 231.8Ki (+31.90% / 175.7Ki) | 3.484k (+0.06% / 3.482k) |
41+
| [Without comment characters](../testdata/small_no_comment_runes.json) | 113.1µ (+17.45% / 96.32µ) | 175.7Ki (+0.01% / 175.7Ki) | 3.482k (~% / 3.482k) |
42+
43+
### [`github.com/goccy/go-json`](https://github.com/goccy/go-json)
44+
45+
| **Small data set** | s/op | B/op | allocs/op |
46+
| -------------------------------------------------------------------------------------- | ------------------------- | ----------------------- | ----------------------- |
47+
| [With comments](../testdata/small.json) | 1.794µ | 1.047Ki | 10.00 |
48+
| [Without comments](../testdata/small_uncommented.json) (comment characters in strings) | 1.797µ (+15.38% / 1.557µ) | 928.0 (+20.83% / 768.0) | 10.00 (+11.11% / 9.000) |
49+
| [Without comment characters](../testdata/small_no_comment_runes.json) | 1.705µ (+3.30% / 1.651µ) | 768.0 (~% / 768.0) | 9.00 (~% / 9.000) |
50+
51+
| **Medium data set** | s/op | B/op | allocs/op |
52+
| -------------------------------------------------------------------------------------- | ------------------------- | --------------------------- | ---------------------- |
53+
| [With comments](../testdata/small.json) | 213.1µ | 434.9Ki | 77.00 |
54+
| [Without comments](../testdata/small_uncommented.json) (comment characters in strings) | 101.4µ (+83.61% / 55.24µ) | 250.4Ki (+28.94% / 194.2Ki) | 73.00 (+2.82% / 71.00) |
55+
| [Without comment characters](../testdata/small_no_comment_runes.json) | 72.60µ (+37.97% / 52.62µ) | 194.2Ki (+0.02% / 194.1Ki) | 71.00 (~% / 71.00) |

0 commit comments

Comments
 (0)