Skip to content

EgineDVFileJson_InterchangeFormat

Johann Petrak edited this page Sep 1, 2018 · 3 revisions

EngineDVFileJson Interchange Format

This is the format of JSON data used during application. The Learning Framework sends instance data in JSON format to the backend and reads back the result of applyign the model to the data.

Format for sending instance data, LF to backend

  • The instance representation sent is what is returned by internal2json(instance,true)
  • for sequence tagging:
    • A list of lists (list of sequence elements, where each sequence element is a list of features)
  • for classification:
    • A list of features
  • only a single instance is sent in each line for processing
  • ((Not actually done at the moment: In order to indicate the end of processing a line with STOP is sent instead))

Conversion of the instance data in the backend

  • The single instance is converted into a batch with one element (list that contains a single instance)
  • Features are converted to "converted representation" (e.g. strings to vocab indices)
  • The batch is reshaped:
    • instead of a list of instances, the batch contains a list of features
    • for each feature, there is a list that contains the feature value/s for each example
    • since we only have one example, each list for a feature contains one element. That element is can be a value (classification and simple feature) or a list (sequence tagging)
  • Eventually this is converted into
    • classification: tensor of shape 1, nfeatures
    • sequence tagging: tensor of shape 1, nfeatures, seqlen

Example for sequence tagging:

  • original instance: [["Finally"],[","],["a"],["boy"],["in"],["the"],["back"],["raises"],["his"],["hand"],["."]]
  • converted batch of 1 instance: [[[1827], [4], [7], [3241], [10], [3], [99], [1], [66], [479], [2]]]
  • reshaped batch of 1 instance: [[[1827, 4, 7, 3241, 10, 3, 99, 1, 66, 479, 2]]]

Format for sending application results, backend to LF

Sequence tagging:

  • the prediction is a tensor of shape 1, seqlen, nclasses
  • this gets converted into an index list of seqlen indices
  • this gets converted into a list of seqlen labels
  • In addition, the scores are returned:
Clone this wiki locally