Skip to content

Commit 803c0b7

Browse files
committed
Add support for JSON Lines
1 parent f90125a commit 803c0b7

File tree

9 files changed

+154
-50
lines changed

9 files changed

+154
-50
lines changed

README.md

Lines changed: 50 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ const jsonStream = stringifyJsonStream({
8282

8383
Each value anywhere the JSON document can be a synchronous value, a promise that resolves to a value, or a sync/async function returning a value. Each value can also be an object stream (created by `objectStream()`), an array stream (created by `arrayStream()`) or a string stream (created by `stringStream()`) (see [stream generators](#stream-generators) for details). String streams can also be used as property keys inside object streams. The stream creators accept an `Iterable`, an `AsyncIterable` or a `ReadableStream` data source.
8484

85-
The stringified JSON stream created by `JsonStringifier` is a `ReadableStream<string>`. Since most JavaScript methods work with `ReadableStream<Uint8Array>` instead, we can convert it using [`TextEncoderStream`](https://developer.mozilla.org/en-US/docs/Web/API/TextEncoderStream) (Node.js >= 18 required). Here is an example of streaming the result in an Express app:
85+
The stringified JSON stream created by `stringifyJsonStream` is a `ReadableStream<string>`. Since most JavaScript methods work with `ReadableStream<Uint8Array>` instead, we can convert it using [`TextEncoderStream`](https://developer.mozilla.org/en-US/docs/Web/API/TextEncoderStream) (Node.js >= 18 required). Here is an example of streaming the result in an Express app:
8686
```typescript
8787
import { stringifyJsonStream, objectStream, arrayStream, stringStream } from "json-stream-es";
8888
import { Writable } from "node:stream";
@@ -101,6 +101,25 @@ If you prefer the generated JSON to be indented, you can pass a number or string
101101

102102
Please also take note of the [differences between `JSON.stringify()` and `stringifyJsonStream()`](#differences-to-jsonstringify).
103103

104+
#### Emitting multiple JSON values (JSONL)
105+
106+
The [JSON Lines](https://jsonlines.org/) standard is a variation of JSON that allows multiple JSON values on the root level, separated by newlines. To emit such a stream, use the [`stringifyMultiJsonStream`](#stringifymultijsonstream) function, which returns a `TransformStream<JsonValue, string>`:
107+
```typescript
108+
import { stringifyMultiJsonStream } from "json-stream-es";
109+
110+
const values = [
111+
{ test1: "object1" },
112+
{ test2: "object2" }
113+
];
114+
115+
iterableToStream(values)
116+
.pipeThrough(stringifyMultiJsonStream())
117+
.pipeThrough(new TextEncoderStream())
118+
.pipeTo(Writable.toWeb(res));
119+
```
120+
121+
The individual values may also contain object/array/string streams like in the examples above.
122+
104123
### Consume a JSON stream
105124

106125
[`parseJsonStream()`](#parsejsonstream) parses a stringified JSON stream, selects specific array items or object properties from it and emits their values. It consumes a `ReadableStream<string>`. Since most JavaScript methods emit a `ReadableStream<Uint8Array>`, we can use [`TextDecoderStream`](https://developer.mozilla.org/en-US/docs/Web/API/TextDecoderStream) to convert that to a string stream (wide browser support since September 2022, so you might need a polyfill). Here is an example how to use it with the [Fetch API](https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API):
@@ -129,9 +148,20 @@ If you need access to not just the object property values, but also their keys,
129148
* `{ value: { test: "value1" }, path: ["results", 0] }`
130149
* `{ value: "value2", path: ["results", 1] }`
131150

132-
#### Consuming multiple objects/arrays
151+
#### Consuming multiple JSON values (JSONL)
152+
153+
The [JSON Lines](https://jsonlines.org/) standard is a variation of JSON that allows multiple JSON values on the root level, separated by newlines. To consume such a stream, pass a `multi: true` option, which will make `parseJsonStream` accept input streams with zero or multiple root values:
154+
```typescript
155+
const stream = res
156+
.pipeThrough(new TextDecoderStream())
157+
.pipeThrough(parseJsonStream(undefined, { multi: true }));
158+
```
159+
160+
By using `undefined` as the path selector will select the root values themselves rather than their properties/elements.
161+
162+
#### Consuming multiple nested objects/arrays
133163

134-
Sometimes you want to consume multiple objects/arrays in a JSON stream. This would be an example JSON document:
164+
Sometimes you want to consume multiple objects/arrays inside a single JSON document, like in this example:
135165
```json
136166
{
137167
"apples": {
@@ -199,7 +229,7 @@ The main features of json-stream-es are:
199229
* [`JsonPathStreamSplitter`](#jsonpathstreamsplitter) to split a stream of JSON values into a stream of sub streams for the values under different paths
200230
* [`JsonChunk` creators](#jsonchunk-creators) to create a stream of `JsonChunk`s by hand
201231
* Provide convenience functions for common combinations of the above:
202-
* [`stringifyJsonStream`](#stringifyjsonstream), combining `JsonSerializer` and `JsonStringifier`
232+
* [`stringifyJsonStream`](#stringifyjsonstream) and [`stringifyMultiJsonStream`](#stringifymultijsonstream), combining `JsonSerializer` and `JsonStringifier`
203233
* [`parseJsonStream`](#parsejsonstream) and [`parseJsonStreamWithPaths`](#parsejsonstreamwithpaths), combining `JsonParser`, `JsonPathDetector`, `JsonPathSelector` and `JsonDeserializer`
204234
* [`parseNestedJsonStream`](#parsenestedjsonstream) and [`parseNestedJsonStreamWithPaths`](#parsenestedjsonstreamwithpaths), combining `JsonParser`, `JsonPathDetector`, `JsonPathSelector`, `JsonPathStreamSplitter` and `JsonDeserializer`.
205235

@@ -302,31 +332,37 @@ stream.pipeThrough(parseJsonStream((path) => (
302332

303333
A convenience function to serialize a JSON value into a stringified JSON stream. Under the hood, it creates a [`JsonSerializer`](#jsonserializer) and pipes it through a [`JsonStringifier`](#jsonstringifier). See those for details.
304334

335+
#### `stringifyMultiJsonStream`
336+
337+
`stringifyMultiJsonStream(space?: string | number): TransformStream<SerializableJsonValue, string>`
338+
339+
Returns a transform stream that accepts zero or more serializable JSON values and emits a stringified JSON stream. Can be used to create a JSON stream that contains multiple values on the root level. Under the hood, it creates a transformer chain of a [`JsonSerializer`](#jsonserializer) and a [`JsonStringifier`](#jsonstringifier). See those for details.
340+
305341
#### `parseJsonStream`
306342

307-
`parseJsonStream(selector: JsonPathSelectorExpression): TransformStream<string, JsonValue>`
343+
`parseJsonStream(selector: JsonPathSelectorExpression | undefined, options?: { multi?: boolean }): TransformStream<string, JsonValue>`
308344

309-
A convenience function to parse a stringified JSON stream, select certain arrays and/or objects from it and stream their values/elements. `selector` needs to be a [JSON path selector](#json-path-selector) that selects one or more objects/values whose values/elements should be streamed.
345+
A convenience function to parse a stringified JSON stream, select certain arrays and/or objects from it and stream their values/elements. `selector` needs to be a [JSON path selector](#json-path-selector) that selects one or more objects/values whose values/elements should be streamed. If `multi` is true, no error will be thrown if the input contains zero or more than one JSON values on the root level. For multi streams, `selector` can undefined to select the root values themselves.
310346

311347
Under the hood, creates a transformer chain of a [`JsonParser`](#jsonparser), [`JsonPathDetector`](#jsonpathdetector), [`JsonPathSelector`](#jsonpathselector) and [`JsonDeserializer`](#jsondeserializer), see those for details.
312348

313349
#### `parseJsonStreamWithPaths`
314350

315-
`parseJsonStreamWithPaths(selector: JsonPathSelectorExpression): TransformStream<string, { value: JsonValue; path: Array<string | number> }>`
351+
`parseJsonStreamWithPaths(selector: JsonPathSelectorExpression | undefined, options?: { multi?: boolean }): TransformStream<string, { value: JsonValue; path: Array<string | number> }>`
316352

317353
Like [`parseJsonStream`](#parsejsonstream), but emits a stream of `{ value: JsonValue; path: Array<string | number> }` instead, where `path` is the path of object property keys and array element indexes of each value. This allows you to to access the property keys when streaming a JSON object.
318354

319355
#### `parseNestedJsonStream`
320356

321-
`parseNestedJsonStream(selector: JsonPathSelectorExpression): TransformStream<string, ReadableStream<JsonValue> & { path: JsonPath }>`
357+
`parseNestedJsonStream(selector: JsonPathSelectorExpression | undefined, options?: { multi?: boolean }): TransformStream<string, ReadableStream<JsonValue> & { path: JsonPath }>`
322358

323-
A convenience function to parse a stringified JSON stream, select certain arrays and/or objects emit a nested stream for each of them emitting their values/elements. `selector` needs to be a [JSON path selector](#json-path-selector) that selects one or more objects/values whose values/elements should be streamed.
359+
A convenience function to parse a stringified JSON stream, select certain arrays and/or objects emit a nested stream for each of them emitting their values/elements. `selector` needs to be a [JSON path selector](#json-path-selector) that selects one or more objects/values whose values/elements should be streamed. If `multi` is true, no error will be thrown if the input contains zero or more than one JSON values on the root level. For multi streams, `selector` can undefined to select the root values themselves.
324360

325361
Under the hood, creates a transformer chain of a [`JsonParser`](#jsonparser), [`JsonPathDetector`](#jsonpathdetector), [`JsonPathSelector`](#jsonpathselector) and [`JsonPathStreamSplitter`](#jsonpathstreamsplitter), and then pipes each sub stream through [`JsonDeserializer`](#jsondeserializer).
326362

327363
#### `parseNestedJsonStreamWithPaths`
328364

329-
`parseNestedJsonStreamWithPaths(selector: JsonPathSelectorExpression): TransformStream<string, ReadableStream<{ value: JsonValue; path: Array<string | number> }> & { path: Array<string, number> }>`
365+
`parseNestedJsonStreamWithPaths(selector: JsonPathSelectorExpression | undefined, options?: { multi?: boolean }): TransformStream<string, ReadableStream<{ value: JsonValue; path: Array<string | number> }> & { path: Array<string, number> }>`
330366

331367
Like [`parseNestedJsonStream`](#parsenestedjsonstream), but the nested streams emit `{ value: JsonValue; path: Array<string | number> }` instead, where `path` is the path of object property keys and array element indexes of each value. In the sub streams, the paths have the path prefix of their containing streamed object/array removed.
332368

@@ -336,11 +372,11 @@ Like [`parseNestedJsonStream`](#parsenestedjsonstream), but the nested streams e
336372

337373
A `TransformStream<string, JsonChunk>` that parses the incoming stringified JSON stream and emits [`JsonChunk` objects](#jsonchunk-objects) for the different tokens that the JSON document is made of.
338374

339-
Construct one using `new JsonParser()` and use it by calling [`.pipeThrough()`](https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/pipeThrough) on a `ReadableStream<string>`.
375+
Construct one using `new JsonParser(options?: { multi?: boolean })` and use it by calling [`.pipeThrough()`](https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/pipeThrough) on a `ReadableStream<string>`. If `multi` is true, accepts zero or multiple JSON values on the root level, otherwise an exception is thrown if the data contains zero or more than one values.
340376

341377
Pass the output on to [`JsonPathDetector`](#jsonpathdetector), [`JsonPathSelector`](#jsonpathselector) and [`JsonDeserializer`](#jsondeserializer) to consume a JSON stream.
342378

343-
The input stream is expected to contain one valid JSON document. If the document is invalid or the input stream contains zero or multiple documents, the stream aborts with an error. This also means that you can rely on the order of the emitted `JsonChunk` objects to be valid (for example, when a `JsonChunkType.STRING_CHUNK` object is emitted, you can be sure that it was preceded b a `JsonChunkType.STRING_START` object).
379+
The input stream is expected to contain valid JSON documents. If the input is not valid JSON (or if the input contains zero or more than one JSON documents and `multi` is not true), the stream aborts with an error. This also means that you can rely on the order of the emitted `JsonChunk` objects to be valid (for example, when a `JsonChunkType.STRING_CHUNK` object is emitted, you can be sure that it was preceded b a `JsonChunkType.STRING_START` object).
344380

345381
#### `JsonStringifier`
346382

@@ -382,6 +418,8 @@ The `SerializableJsonValue` input chunks can be any valid JSON values, that is `
382418

383419
As the `space` constructor argument, you can specify a number of indentation spaces or an indentation string, equivalent to the [space](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify#space) parameter of `JSON.stringify()`. This will cause `WHITESPACE` chunks to be emitted in the appropriate places.
384420

421+
Multiple JSON values on the root level are always separated by a newline (`\n`). This means that when no `space` is defined, the output produced by `JsonSerializer` piped through `JsonStringifier` fulfills the [JSON Lines](https://jsonlines.org/) (JSONL) standard.
422+
385423
##### Differences to `JSON.stringify()`
386424

387425
`JsonSerializer` aims to mimic the behaviour of [`JSON.stringify()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify), with the following differences.

package.json

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,7 @@
11
{
22
"name": "json-stream-es",
3-
"description": "A streaming JSON parser/stringifier using web streams.",
4-
"tags": [
5-
"json",
6-
"stream",
7-
"webstreams"
8-
],
3+
"description": "A streaming JSON/JSONL parser/stringifier using web streams.",
4+
"keywords": ["json", "jsonl", "stream", "webstreams"],
95
"version": "1.0.0",
106
"author": "Candid Dauth <cdauth@cdauth.eu>",
117
"repository": {

src/__tests__/convenience.test.ts

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,30 @@ test("parseJsonStream", async () => {
3434
expect(await streamToArray(stream)).toEqual(["apple1", "apple2", "cherry1", "cherry2"]);
3535
});
3636

37+
test("parseJsonStream fails for multiple documents", async () => {
38+
const stream = stringToStream(`"value1"\n"value2"`)
39+
.pipeThrough(parseJsonStream([]));
40+
await expect(async () => await streamToArray(stream)).rejects.toThrowError("Unexpected character");
41+
});
42+
43+
test("parseJsonStream multi", async () => {
44+
const documents = [
45+
"value1",
46+
2,
47+
{ test3: "value3" },
48+
["value4"]
49+
];
50+
const stream = stringToStream(documents.map((v) => JSON.stringify(v)).join("\n"))
51+
.pipeThrough(parseJsonStream(undefined, { multi: true }));
52+
expect(await streamToArray(stream)).toEqual(documents);
53+
});
54+
55+
test("parseJsonStream multi no values", async () => {
56+
const stream = stringToStream("")
57+
.pipeThrough(parseJsonStream(undefined, { multi: true }));
58+
expect(await streamToArray(stream)).toEqual([]);
59+
});
60+
3761
test("parseJsonStreamWithPaths", async () => {
3862
const stream = stringToStream(JSON.stringify(testObject))
3963
.pipeThrough(parseJsonStreamWithPaths([["apples", "cherries"], "results"]));

src/convenience.ts

Lines changed: 35 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import { JsonDeserializer, type JsonValueAndPath } from "./json-deserializer";
2-
import { JsonParser } from "./json-parser";
3-
import { serializeJsonValue, type SerializableJsonValue } from "./json-serializer";
2+
import { JsonParser, type JsonParserOptions } from "./json-parser";
3+
import { JsonSerializer, serializeJsonValue, type SerializableJsonValue } from "./json-serializer";
44
import { JsonStringifier } from "./json-stringifier";
55
import { JsonPathDetector, type JsonPath } from "./json-path-detector";
66
import { JsonPathSelector, matchesJsonPathSelector, type JsonPathSelectorExpression } from "./json-path-selector";
@@ -12,40 +12,59 @@ export function stringifyJsonStream(value: SerializableJsonValue, space?: string
1212
return serializeJsonValue(value, space).pipeThrough(new JsonStringifier());
1313
}
1414

15+
export function stringifyMultiJsonStream(space?: string | number): TransformStream<SerializableJsonValue, string> {
16+
return new PipeableTransformStream((readable) => {
17+
return readable
18+
.pipeThrough(new JsonSerializer(space))
19+
.pipeThrough(new JsonStringifier());
20+
});
21+
}
22+
1523
class ValueExtractor extends AbstractTransformStream<JsonValueAndPath, JsonValue> {
1624
protected override transform(chunk: JsonValueAndPath, controller: TransformStreamDefaultController<JsonValue>) {
1725
controller.enqueue(chunk.value);
1826
}
1927
}
2028

2129
export function parseJsonStreamWithPaths(
22-
selector: JsonPathSelectorExpression
30+
selector: JsonPathSelectorExpression | undefined,
31+
options?: JsonParserOptions
2332
): TransformStream<string, JsonValueAndPath> {
2433
return new PipeableTransformStream((readable) => {
25-
return readable
26-
.pipeThrough(new JsonParser())
27-
.pipeThrough(new JsonPathDetector())
28-
.pipeThrough(new JsonPathSelector((path) => path.length > 0 && matchesJsonPathSelector(path.slice(0, -1), selector)))
29-
.pipeThrough(new JsonDeserializer());
34+
let result = readable
35+
.pipeThrough(new JsonParser(options))
36+
.pipeThrough(new JsonPathDetector());
37+
if (selector) {
38+
result = result.pipeThrough(new JsonPathSelector((path) => path.length > 0 && matchesJsonPathSelector(path.slice(0, -1), selector)));
39+
}
40+
return result.pipeThrough(new JsonDeserializer());
3041
});
3142
}
3243

3344
export function parseJsonStream(
34-
selector: JsonPathSelectorExpression
45+
selector: JsonPathSelectorExpression | undefined,
46+
options?: JsonParserOptions
3547
): TransformStream<string, JsonValue> {
3648
return new PipeableTransformStream((readable) => {
37-
return readable
38-
.pipeThrough(parseJsonStreamWithPaths(selector))
49+
let result = readable.pipeThrough(new JsonParser(options))
50+
if (selector) {
51+
result = result
52+
.pipeThrough(new JsonPathDetector())
53+
.pipeThrough(new JsonPathSelector((path) => path.length > 0 && matchesJsonPathSelector(path.slice(0, -1), selector)));
54+
}
55+
return result
56+
.pipeThrough(new JsonDeserializer())
3957
.pipeThrough(new ValueExtractor());
4058
});
4159
}
4260

4361
export function parseNestedJsonStreamWithPaths(
44-
selector: JsonPathSelectorExpression
62+
selector: JsonPathSelectorExpression,
63+
options?: JsonParserOptions
4564
): TransformStream<string, ReadableStream<JsonValueAndPath> & { path: JsonPath }> {
4665
return new PipeableTransformStream((readable) => {
4766
return readable
48-
.pipeThrough(new JsonParser())
67+
.pipeThrough(new JsonParser(options))
4968
.pipeThrough(new JsonPathDetector())
5069
.pipeThrough(new JsonPathSelector(selector))
5170
.pipeThrough(new JsonPathStreamSplitter())
@@ -63,11 +82,12 @@ export function parseNestedJsonStreamWithPaths(
6382
}
6483

6584
export function parseNestedJsonStream(
66-
selector: JsonPathSelectorExpression
85+
selector: JsonPathSelectorExpression,
86+
options?: JsonParserOptions
6787
): TransformStream<string, ReadableStream<JsonValue> & { path: JsonPath }> {
6888
return new PipeableTransformStream((readable) => {
6989
return readable
70-
.pipeThrough(parseNestedJsonStreamWithPaths(selector))
90+
.pipeThrough(parseNestedJsonStreamWithPaths(selector, options))
7191
.pipeThrough(new TransformStream({
7292
transform: (chunk, controller) => {
7393
controller.enqueue(Object.assign(

src/json-deserializer.ts

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,10 @@ export type JsonValueAndPath = JsonValueAndOptionalPath<JsonChunkWithPath>;
3939
export class JsonDeserializer<C extends JsonChunk & { path?: JsonPath } = JsonChunkWithPath> extends AbstractTransformStream<C, JsonValueAndOptionalPath<C>> {
4040
protected state: State<C> = { type: StateType.ROOT, value: undefined, path: [] };
4141

42+
constructor() {
43+
super();
44+
}
45+
4246
protected handleValueEnd(controller: TransformStreamDefaultController<JsonValueAndOptionalPath<C>>): void {
4347
if (this.state.type === StateType.ROOT) {
4448
if (this.state.value !== undefined) {

0 commit comments

Comments
 (0)