-
Notifications
You must be signed in to change notification settings - Fork 0
JSON Parsing strategy
To parse the JSON data from a string into a Kotlin object the string content is being tokenized by the tokenizer and then parsed using a slightly modified LL(1) parser.
---
title: Parser Flow
---
graph LR;
S("Source (String)")-->L;
O("Source (Other)")-.- Convert -.->S;
O-.->LT;
subgraph LI[Lexer Interface]
L(String Lexer);
LT("(Source) Lexer");
end
LI-->P(Parser);
P-->D(Data in Memory)
style Convert fill:#e1e1e1,stroke:none
style LT stroke:#808080,stroke-dasharray: 5 5
style O stroke:#808080,stroke-dasharray: 5 5
Note
Everything that is currently dashed is not being completely implemented and needs some extra work.
The parser can be in different states, depending on which element it is currently working on.
At the beginning of parsing, the parser will always be in the START
state.
In this state if a primitive token that can directly be converted into an object (boolean, number, string, null) it will directly stop and emit this element as it cannot parse anything else. If there are still tokens left (except the EOF
-token) it will throw an error as the JSON is malformed.
If it encounters a {
it will switch into the OBJECT
state and if it encounters a [
it will switch into the ARRAY
state.
---
title: START state
---
stateDiagram-v2
direction LR
[*] --> Token
state Start {
direction LR
state "Read Token" as Token
state Choice <<choice>>
Token --> Choice
state "Switch to OBJECT state" as Curly
Choice --> Curly : Open Curly <br> Brace ('{')
state "Switch to ARRAY state" as Square
Choice --> Square : Open Square <br >Bracket ('[')
Curly --> Continue
Square --> Continue
state "Emit element directly" as Direct
Choice --> Direct : boolean, number, <br> string, null
state "Throw error" as Error
Direct --> Error
Choice --> Error : Anything else
classDef next fill:white
classDef error fill:#f00,color:white,font-weight:bold,stroke-width:2px,stroke:#ff5555
class Error error
class Continue next
}
Direct --> [*]
Note
If the parser is switching to the OBJECT
state, there will always be a Map
pushed onto the stack that can be filled.
If the map on the stack is empty, then currently the first element will be parsed. If there is already at least one element in the map, then there was already one item that was being parsed and with that there has to be a comma that separates the different elements of the object.
A map consists of a {
, some key-value pairs that are being comma separated and a }
to close it off. With that the parser can parse the different key value pairs by always checking if there is a STRING
token (which will be the key), a :
and then a value. If it is a primitive token, then it can just directly be added to the Map that is currently at the top of the stack. If the token is a {
or a [
it adds the key to the stack (so that it will be remembered), changes the state accordingly and then also pushes an empty object onto the stack.
If the token is a }
token, then it should pop the Map and if there is a String on the stack, pop it as well and put both as a key/value pair into the map, or else push it into the array. But if the stack is empty it should return and emit the map element.
---
title: OBJECT state
---
stateDiagram-v2
direction LR
[*] --> Token
state Object {
direction LR
state "Read Token" as Token
state Choice <<choice>>
Token --> Choice
state "Keep OBJECT state" as Curly
Choice --> Curly : Open Curly <br> Brace ('{')
state "Switch to ARRAY state" as Square
Choice --> Square : Open Square <br> Bracket ('[')
Choice --> String : string
String --> Colon : '#58;' token
state "Pop the object <br> from stack" as CloseSquare
Choice --> CloseSquare : Close Curly <br> Brace ('}')
state ArrayChoice <<choice>>
CloseSquare --> ArrayChoice
state "Throw error" as Error
Choice --> Error : Anything else
}
state "Add to array element <br> on stack" as Direct
state "Add key/value to object" as Add
Colon --> Add : boolean, number, <br> string, null
Add --> Continue
Curly --> Continue
Square --> Continue
Direct --> Continue
state AddChoice <<choice>>
ArrayChoice --> AddChoice : Some element on stack
AddChoice --> Direct : List object on stack
state "Pop string and add <br> as key/value to <br> object on stack" as ObjectAdd
AddChoice --> ObjectAdd : String object on stack
ObjectAdd --> Continue
ArrayChoice --> Emit : No element on the stack
Emit --> [*]
classDef next fill:white
classDef error fill:#f00,color:white,font-weight:bold,stroke-width:2px,stroke:#ff5555
class Error error
class Continue next
Note
If the parser is switching to the ARRAY
state, there will always be a List
pushed onto the stack that can be filled.
The array works similarly to the object, but does not have a key/value pair, but rather single values which are also comma separated. Also the delimiter of the array are [
and ]
. Everything else works like the same.
---
title: ARRAY state
---
stateDiagram-v2
direction LR
[*] --> Token
state Array {
direction LR
state "Read Token" as Token
state Choice <<choice>>
Token --> Choice
state "Switch to OBJECT state" as Curly
Choice --> Curly : Open Curly <br> Brace ('{')
state "Switch to ARRAY state" as Square
Choice --> Square : Open Square <br> Bracket ('[')
state "Pop the array <br> from stack" as CloseSquare
Choice --> CloseSquare : Close Square <br> Bracket (']')
state ArrayChoice <<choice>>
CloseSquare --> ArrayChoice
state "Throw error" as Error
Choice --> Error : Anything else
}
state "Add to array element <br> on stack" as Direct
Choice --> Direct : boolean, number, <br> string, null
Curly --> Continue
Square --> Continue
Direct --> Continue
state AddChoice <<choice>>
ArrayChoice --> AddChoice : Some element on stack
AddChoice --> Direct : List object on stack
state "Pop string and add <br> as key/value to stack" as ObjectAdd
AddChoice --> ObjectAdd : String object on stack
ObjectAdd --> Continue
ArrayChoice --> Emit : No element on the stack
Emit --> [*]
classDef next fill:white
classDef error fill:#f00,color:white,font-weight:bold,stroke-width:2px,stroke:#ff5555
class Error error
class Continue next
KotlinJsonParser is a toy project created for educational purposes. It is not intended for production use, and while efforts have been made to ensure the code is functional, it may not cover all edge cases or be optimized for performance. Use at your own risk.
But most importantly: Have fun! :D
The content of this wiki is licensed under the CC-BY-NC-SA 4.0 License.
