Skip to content

JSON Parsing strategy

QuickWrite edited this page Sep 3, 2024 · 2 revisions

To parse the JSON data from a string into a Kotlin object the string content is being tokenized by the tokenizer and then parsed using a slightly modified LL(1) parser.

---
title: Parser Flow
---

graph LR;
    S("Source (String)")-->L;
    O("Source (Other)")-.- Convert -.->S;
    O-.->LT;

    subgraph LI[Lexer Interface]
    L(String Lexer);
    LT("(Source) Lexer");
    end

    LI-->P(Parser);
    P-->D(Data in Memory)
    
    style Convert fill:#e1e1e1,stroke:none
    style LT stroke:#808080,stroke-dasharray: 5 5
    style O stroke:#808080,stroke-dasharray: 5 5

Loading

Note

Everything that is currently dashed is not being completely implemented and needs some extra work.

How the parser works

The parser can be in different states, depending on which element it is currently working on.

START State

At the beginning of parsing, the parser will always be in the START state. In this state if a primitive token that can directly be converted into an object (boolean, number, string, null) it will directly stop and emit this element as it cannot parse anything else. If there are still tokens left (except the EOF-token) it will throw an error as the JSON is malformed.

If it encounters a { it will switch into the OBJECT state and if it encounters a [ it will switch into the ARRAY state.

---
title: START state
---

stateDiagram-v2
    direction LR
    [*] --> Token

    state Start {
        direction LR

        state "Read Token" as Token
        state Choice <<choice>>
        Token --> Choice

        state "Switch to OBJECT state" as Curly
        Choice --> Curly : Open Curly <br> Brace ('{')

        state "Switch to ARRAY state" as Square
        Choice --> Square : Open Square <br >Bracket ('[')

        Curly --> Continue
        Square --> Continue

        state "Emit element directly" as Direct
        Choice --> Direct : boolean, number, <br> string, null

        state "Throw error" as Error
        Direct --> Error
        Choice --> Error : Anything else

        classDef next fill:white
        classDef error fill:#f00,color:white,font-weight:bold,stroke-width:2px,stroke:#ff5555
        
        class Error error
        class Continue next
    }

    Direct --> [*]
Loading

OBJECT state

Note

If the parser is switching to the OBJECT state, there will always be a Map pushed onto the stack that can be filled.

If the map on the stack is empty, then currently the first element will be parsed. If there is already at least one element in the map, then there was already one item that was being parsed and with that there has to be a comma that separates the different elements of the object.

A map consists of a {, some key-value pairs that are being comma separated and a } to close it off. With that the parser can parse the different key value pairs by always checking if there is a STRING token (which will be the key), a : and then a value. If it is a primitive token, then it can just directly be added to the Map that is currently at the top of the stack. If the token is a { or a [ it adds the key to the stack (so that it will be remembered), changes the state accordingly and then also pushes an empty object onto the stack.

If the token is a } token, then it should pop the Map and if there is a String on the stack, pop it as well and put both as a key/value pair into the map, or else push it into the array. But if the stack is empty it should return and emit the map element.

---
title: OBJECT state
---

stateDiagram-v2
    direction LR
    [*] --> Token

    state Object {
        direction LR

        state "Read Token" as Token
        state Choice <<choice>>
        Token --> Choice

        state "Keep OBJECT state" as Curly
        Choice --> Curly : Open Curly <br> Brace ('{')

        state "Switch to ARRAY state" as Square
        Choice --> Square : Open Square <br> Bracket ('[')

        Choice --> String : string
        String --> Colon : '#58;' token

        state "Pop the object <br> from stack" as CloseSquare
        Choice --> CloseSquare : Close Curly <br> Brace ('}')

        state ArrayChoice <<choice>>
        CloseSquare --> ArrayChoice

        state "Throw error" as Error
    
        Choice --> Error : Anything else
    }

    state "Add to array element <br> on stack" as Direct

    state "Add key/value to object" as Add
    Colon --> Add : boolean, number, <br> string, null
    Add --> Continue
    Curly --> Continue
    Square --> Continue
    Direct --> Continue

    state AddChoice <<choice>>
    ArrayChoice --> AddChoice : Some element on stack

    AddChoice --> Direct : List object on stack

    state "Pop string and add <br> as key/value to <br> object on stack" as ObjectAdd
    AddChoice --> ObjectAdd : String object on stack
    ObjectAdd --> Continue

    ArrayChoice --> Emit : No element on the stack
    Emit --> [*]

    classDef next fill:white
    classDef error fill:#f00,color:white,font-weight:bold,stroke-width:2px,stroke:#ff5555
        
    class Error error
    class Continue next
Loading

ARRAY state

Note

If the parser is switching to the ARRAY state, there will always be a List pushed onto the stack that can be filled.

The array works similarly to the object, but does not have a key/value pair, but rather single values which are also comma separated. Also the delimiter of the array are [ and ]. Everything else works like the same.

---
title: ARRAY state
---

stateDiagram-v2
    direction LR
    [*] --> Token

    state Array {
        direction LR

        state "Read Token" as Token
        state Choice <<choice>>
        Token --> Choice

        state "Switch to OBJECT state" as Curly
        Choice --> Curly : Open Curly <br> Brace ('{')

        state "Switch to ARRAY state" as Square
        Choice --> Square : Open Square <br> Bracket ('[')

        state "Pop the array <br> from stack" as CloseSquare
        Choice --> CloseSquare : Close Square <br> Bracket (']')

        state ArrayChoice <<choice>>
        CloseSquare --> ArrayChoice

        state "Throw error" as Error
    
        Choice --> Error : Anything else
    }

    state "Add to array element <br> on stack" as Direct
    Choice --> Direct : boolean, number, <br> string, null
    Curly --> Continue
    Square --> Continue
    Direct --> Continue

    state AddChoice <<choice>>
    ArrayChoice --> AddChoice : Some element on stack

    AddChoice --> Direct : List object on stack

    state "Pop string and add <br> as key/value to stack" as ObjectAdd
    AddChoice --> ObjectAdd : String object on stack
    ObjectAdd --> Continue

    ArrayChoice --> Emit : No element on the stack
    Emit --> [*]

    classDef next fill:white
    classDef error fill:#f00,color:white,font-weight:bold,stroke-width:2px,stroke:#ff5555
        
    class Error error
    class Continue next
Loading
Clone this wiki locally