Skip to content

Class_EngineDVFileJson

Johann Petrak edited this page Apr 16, 2018 · 8 revisions

Class EngineDVFileJson

This page describes the inner workings of class EngineDVFileJson which implements the engine for using external algorithms with dense vectors represented in JSON format (and handled out-of-memory).

Protocol of use

Currently (as of 2018-04-16), the invocation protocol for engines is a bit complex. The required protocol depends on the situation the engine gets used in (training versus application).

When training:

  • The engine class gets selected in the PR based on the trainingAlgorithm runtime PR
  • Engine.createEngine(trainingAlgorithm, algorithmParameters, featureInfo, TargetType, dataDirectory) is called
    • this executes the non-static initializeAlgorithm(algorithm,parms) method (overriden but empty for EngineDVFileJson)
    • then runs method initWhenCreating(directory, algorithm, parms, featureInfo, targetType): for EngineDVFileJson, this essentially creates the instance of the appropriate corpus representation and sets the mode to "adding".
    • creates and initializes the Info instance
    • returns the Engine instance
  • document processing uses the corpus representation retrieved from the engine to add new instances
  • After all documents have been processed, the engine's info gets updated
  • Then engine.trainModel(dataDir, instanceAnnotationType, algoParms) gets called:
    • turns off adding for the corpus representation
    • updates the info
    • copies the whole wrapper software unless already there (based on WRAPPER_NAME)
    • creates the command to invoke the training script, also using the settings in the config file WRAPPER_NAME.yaml which is treated as a key/value map
    • this optionally uses settings shellcmd and shellparms for running the shell script
    • TODO: this should also allow to configure the python path and python location
    • before running the command, sets environment variable WRAPPER_HOME which is a subdirectory of the data directory.
    • runs the command
    • updates the info and saves it
    • saves the featureInfo (NOTE: this is currently done again later in the saveEngine method)
  • Finally engine.saveEngine(dataDir) gets called (from base class Engine) which:
    • saves the feature info using featureInfo.save(dir)
    • invokes the engine-specific saveModel(dir) class, in this case, this does nothing since the model gets saved by the scripts we call
    • invokes the engine-specific saveCorpusRepresentation(dir) class, which in this case does nothing, since the corpus representation is already out-of-memory and stored to a file

When applying a model:

  • call engine.loadEngine(datadir, parms) -- this static method in turn runs:
    • load the Info
    • load the FeatureInfo
    • create a new instance of the Engine class (which is stored in the Info)
    • Set the info in the new instance
    • call the engine's initWhenLoading(dir, parms) method. This is NOT overridden by the EngineDVFileJson class and calls:
      • the engine-specific loadModel(dir, parms) class which is overriden:
        • runs loadAndSetCorpusRepresentation(dir) (NOTE: this is duplicate but non-harming duplicate see below)
        • if not already there, copies the wrapper software
        • builds the command for running the application script (similar to train script, just different name)
        • starts the script to communicate with
      • runs the engining specific loadAndSetCorpusRepresentation(dir) method
      • creates the algorithm instance
      • calls the engine's initializeAlgorithm(algorithm, parms) method -- overriden but does nothing
  • processes all the documents, calling engine.applyModel(...). This is overriden to, for each instance in the document:
    • convert the annotation to json
    • send the json to the process
    • get back the json from the process
    • convert what we get back to model application instances and collect them
    • return all the model application instances
Clone this wiki locally