Skip to content
Yannick Marcon edited this page Nov 1, 2018 · 21 revisions

Summary

The objective is to integrate an analysis mechanism that will run different tests, will report a status for each of them with a meaningful comment. The mechanism provided by opal will be analysis independent and analysis routines will be provided by plugins. A generic analysis plugin will be based on Github repo (routines will be downloaded from Github, repositories are configurable).

Rational

Improve the data quality by reusing and sharing data quality inspection routines.

References

User Stories

Describe simply who is doing what and how to obtain the result.

# Who What How Result
1
2
...

Scope

  • Plugin
  • GWT UI
  • R server

Design and Implementation Plan

Despite Opal only support R as data processing backend, it would be good to have an data processing engine independent analysis plugin definition. Then a R specific SPI will be designed.

The data analysis flow will:

  • on startup, opal discovers the opal-analysis plugins and makes them available in the UI (list of plugins with their associated schema form)
  • user goes to a table page where he/she can find a new tab called "Analysis"
  • the "Analysis" tab offers:
    • to prepare an analysis by selecting a routine, settings its parameters and specifying (option) the subset of the table on which the routine is to be executed
    • to view the previous analysis with their associated report: status, message, times and report in markdown or in pdf
    • an analysis batch (set of routines with parameters) can be saved for reuse.
  • when an analysis batch is sent:
    • a project task is created /project/{prj}/_analyse
    • a R session is created, the table (or its subset, represented by a view wrapper) is then pushed to the R session as a tibble
    • the R analysis service is then triggered with the tibble symbol and the R session provided
    • each of the R analysis routine must return a result including: status, message, times and report in markdown or in pdf. These results are persisted (data files or orientdb?)
    • the user can follow the progress of the analysis batch in the "Tasks" tab

Domain

Analysis SPI

The analysis SPI (and its R specific version) will define:

  • AnalysisTemplate that describes the analysis routine (name, title, description (markdown), JSON schema form)
  • Analysis represents an instance of an analysis request, with a name, associated template name, and parameters
  • AnalysisResult reports the results of an Analysis (timestamps, status, output message, report location). Sometimes a single analysis has multiple parameters that leads to multiple sub-results. In this case there is a global result (if one sub-result is Failed, then global result is Failed as well) and a list of sub-results.

Analysis plugin implementations

The data analysis plugin implementations will have access to the convenient classes that allows to perform R operations from Opal (R script, data assignment, file transfers).

A generic R-based analysis plugin implementation could be provided. This implementation could download R routines from one or more github repositories (file layout would follow some naming conventions). This way the same plugin (configured in the opal plugin management page) could propose a set of analysis that is extensible.

Web Services

REST Description
GET /analysis-plugins List available analysis plugins and for each of them the analysis templates
GET /analysis-plugin/{plg} Get a specific analysis plugin DTO
POST /project/{prj}/commands/_analyse Launch a analysis task (can be multiple sub-analysis, but only on one table)
GET /project/{prj}/analyses List all the recorded analyses for the project (filtered by permission)
GET /project/{prj}/table/{tbl}/analyses List all the recorded analyses for the table
GET /project/{prj}/table/{tbl}/analysis/{id} Get a specific table recorded analysis
GET /project/{prj}/table/{tbl}/analysis/{id}/results Get a specific table recorded analysis results
GET /project/{prj}/table/{tbl}/analysis/{id}/result/{rid} Get a specific table recorded analysis result

Permissions

A user needs to have the permission to see table values in order to make analysis (pretty much like the reports).

Analyse Task

The analysis is executed asynchronously by the task system. See how done with the Report.

Analysis Results Persistence

The analysis results are to be saved in Opal's internal database (OrientDB). Result files, if any, are to stored separatly in the folder OPAL_HOME/data/analyses/{id}/results/{rid}.

Analysis Life Cycle

  • On Analysis deletion, all associated results shall be deleted
  • On Table deletion, all associated analysis with their result shall be deleted

UI Mockups

Analyses Tab

The list is a list of AnalysesResults.

  • on Run, the analyse task is submitted and the user is informed by a notification growl,
  • on View, the analysis type and parameters are displayed (same as the New Analysis dialog but read-only) with the details of the result,
  • on Duplicate, a New Analysis dialog is opened with the same field values,
  • on Delete, the analysis result is removed (and associated documents and permissions).

New Analysis Dialog

On analysis type selection, the Parameters section is rebuilt from the corresponding schemaform.

Analysis Result dialog

Viewing analysis result shows the analysis type and parameters and the last execution results. Depending whether there are sub-results, a Details section shows the status of each of these tests.

Some results can have sub result items:

Test/Demo Plan

How can the feature be tested or demonstrated. It is important to describe this in fairly great details so anyone can perform the demo or test.

Unresolved Issues

This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.