SJP (Streaming JSON Parser) is a small, lightweight JSON parser that is designed for embedded use. It was designed to be paired with ACEWS for use with websockets. It can process chunks of data in any size, from bytes at a time to complete JSON documents or even multiple documents.
A callback system is employed -- the start, continuation, or end of a data element is detected, the callback is triggered with information about the event. A stack is also used to keep track of array and object nesting.
SJP uses a single data structure to keep track of its state. The library itself only consists of 4 functions:
sjp_init(cb) → sjp
: create sjpsjp_free(sjp) → void
: free sjpsjp_reset(sjp) → void
: reset sjp (for error recovery)sjp_parse(sjp, buf, len) → int
: parse a chunk, -1 on error
The sjp
structure:
user
: user-defined pointercb
: callback functiond
: stack depthstk[]
: stacki
: indext
: token
str
: string valuestr_bytes
: string byte lengthstr_pos
: string bytes so farnum
: number value
Tokens are split up into two parts, the data element type, and flags. Data types, values 0 thru 7:
SJP_NONE_T
: Not a valid typeSJP_OBJ_T
: Object typeSJP_ARRAY_T
: Array typeSJP_STR_T
: String typeSJP_NUM_T
: Number typeSJP_TRUE_T
: literaltrue
typeSJP_FALSE_T
: literalfalse
typeSJP_NULL_T
: literalnull
type
There is a mask, SJP_TYPE
to remove the flags so you can test the data
element type separately. These are the user flags:
SJP_KEY
: String is an object keySJP_START
: Begining of typeSJP_END
: End of type
There are a couple of other flags, but they are used internally for parsing
state. There are also a couple of masks, one for data element type, SJP_TYPE
and the other for the user-facing flags, SJP_FLAGS
.
The callback function signature is callback(sjp) → void
and you're expected
to check sjp->d
, sjp->stk[sjp->d].i
, and sjp->stk[sjp->d].t
to figure out
what you should do within the callback.
Take the following document for example:
{
"name": "John",
"age": 30,
"car": null
}
Here's an example of how you could process it:
static void callback(sjp_t *sjp)
{
user_t *user = sjp->user;
/* do something with sjp state */
}
static void process_document(const char *document, user_t *user)
{
sjp_t *sjp = sjp_init(callback);
sjp->user = user;
if (sjp_parse(sjp, document, strlen(document)) < 0) {
fprintf(stderr, "error parsing document");
/* perform any necessary cleanup */
}
sjp_free(sjp);
}
If you ran process_document
with the above document, 8 calls to callback with
the following values would occur:
sjp->d == 0
sjp->stk[0].i == 0
sjp->stk[0].t & SJP_TYPE == SJP_OBJ_T
sjp->stk[0].t & SJP_FLAGS == SJP_START
sjp->d == 1
sjp->stk[1].i == 0
(sjp->stk[1].t & SJP_TYPE) == SJP_STR_T
(sjp->stk[1].t & SJP_FLAGS) == SJP_KEY | SJP_START | SJP_END
sjp->str_len == 4
sjp->str == "name"
sjp->d == 1
sjp->stk[1].i == 0
(sjp->stk[1].t & SJP_TYPE) == SJP_STR_T
(sjp->stk[1].t & SJP_FLAGS) == SJP_START | SJP_END
sjp->str_len == 4
sjp->str == "John"
sjp->d == 1
sjp->stk[1].i == 1
(sjp->stk[1].t & SJP_TYPE) == SJP_STR_T
(sjp->stk[1].t & SJP_FLAGS) == SJP_KEY | SJP_START | SJP_END
sjp->str_len == 3
sjp->str == "age"
sjp->d == 1
sjp->stk[1].i == 1
(sjp->stk[1].t & SJP_TYPE) == SJP_NUM_T
(sjp->stk[1].t & SJP_FLAGS) == SJP_START | SJP_END
sjp->num == 30
sjp->d == 1
sjp->stk[1].i == 2
(sjp->stk[1].t & SJP_TYPE) == SJP_STR_T
(sjp->stk[1].t & SJP_FLAGS) == SJP_KEY | SJP_START | SJP_END
sjp->str_len == 3
sjp->str == "car"
sjp->d == 1
sjp->stk[1].i == 2
(sjp->stk[1].t & SJP_TYPE) == SJP_NULL_T
(sjp->stk[1].t & SJP_FLAGS) == SJP_START | SJP_END
sjp->d == 0
sjp->stk[0].i == 0
sjp->stk[0].t & SJP_TYPE == SJP_OBJ_T
sjp->stk[0].t & SJP_FLAGS == SJP_END
Note that strings can and will be split on a UTF-8 character boundary if a
document is split up in chunks when being parsed. You can use sjp->str_pos
to
get the current byte position in the string.