-
Notifications
You must be signed in to change notification settings - Fork 1
PET Development and Software Architecture
TODO: add big architechture diagramm, and abstract diagramm
TODO: remove references to other deliverable sections
-
The user can give some basic configuration commands as arguments at start. To use the same commands automatically for every tool start, it is possible to write these commands into a configuration file. The tool provides a command line interface (CLI) and a graphical user interface (GUI) for the interaction with the user; the use of the GUI is optional, as it is possible to execute the tool without the need of a graphical desktop. The user has full control over the extraction process and can decide which extraction modules to use, which files to consider and which (subsets of the) extracted information to keep.
-
The Extraction Controller Builder is responsible for building the Extraction Controller, the “heart” of the application, based on the given user commands. It is designed following the builder design pattern and is only executed once for building a single instance at tool start.
-
The Extraction Controller is the main controlling class of the application. It has access to all other controllers and is responsible for the application flow. At tool start it initializes all other controllers, and shuts them down at the end of the tools execution. All specialized controller components communicate exclusively over the Extraction Controller with other controllers, so this main controller is responsible for updating the states of other components and for serving as an intermediary.
-
Profiles are created by the user to organize the extraction components, e.g. based on use purposes. They contain a list of configured Extraction Modules, and a list of Extraction Result Collections to collect the information extracted by the modules and to keep references to important files where this information is related to. Both components are described in the following two points.
-
Extraction Modules implement the techniques that designate how and when information is extracted. They provide different implementations of algorithms to be executed in the computer system environment of different operating systems. There are three different kinds of Extraction Modules: file-dependent, file-independent and daemons. File-dependent modules take as argument a path to a file to extract information that is valid only for this file, whereas file-independent modules extract environment information which is valid for all files within the environment. Daemon modules, on the other hand, don't extract information, but instead monitor the environment for the occurrence of designated events. It is also possible to develop and easily plug into the application customized modules for extracting specialized information or for monitoring specific events. A class template for supporting the developer(s) is provided for this purpose.
-
Extraction Result Collections are the data structures that keep the extracted information. Each collection belongs to one of two sub-classes: Environment or Part. An Environment collects all extracted file-independent information that belongs to a Profile, whereby each Profile has only one Environment class. Parts keep the extracted information that is valid only for a specific file together with a path to this file. They can be seen as a file-part of a Digital Object, but, in order to increase flexibility, we intentionally didn't implement a Digital Object as a data structure.
-
The Profile Controller manages all Profiles and Profile Templates, which can be used for the fast creation of a preconfigured Profile. It is possible to export existing Profiles as Profile Templates, to be able to pass them to other PET users.
-
The Module Controller searches (with the help of Java reflections) for all available module classes and creates a list of generic extraction modules provided for creating Extraction Module instances for Profiles. After their creation, most of the Extraction Modules have to be configured before they can be executed.
-
The Extractor is responsible for executing the Extraction Modules and for saving the Extraction Results into the right Extraction Result Collections. It supports two extraction modes: (a) a snapshot extraction that executes each Extraction Module of each Profile for capturing the current information state, (b) a continuous extraction mode that initiates a new extraction by the Event Controller when an event is detected by the environment monitoring daemons (the File Monitor and the Daemon Modules).
-
The Event Controller receives all Events detected by the monitoring daemons and controls the event handling. It uses a queue for handling the events in the order of emerging.
-
Monitoring daemons are the File Monitor (see 12) and the Daemon Modules (see 5).
-
The File Monitor is responsible for observing the files that are added to the Profiles for changes. If a modification to one of the files is detected, a new extraction for all modules related to this file will be initiated. In case of a file deletion, all Profiles that include this file as Part are informed, and will remove the file from their list. Contrary to the exchangeable daemon modules, this is an inherent component of the application.
-
The Configuration Saver saves the state of the application at the end of the tools execution to configuration files, and loads the state at the next start of the tool. The Profiles will be saved with all the added files and their modules with their configurations. Furthermore, the current extraction mode and general usage options are saved.
-
The Storage Controller allows generic access to the exchangeable Storage. It provides methods for saving and loading extracted information to and from the Storage.
-
Storage: save and load metadata using a modular storage support. Currently implemented three storage interfaces: defaults to a simple flat filesystem storage with Json mapping, one using elasticsearch [75] and a third using mapdb [76].
-
PET works together with an information encapsulation tool, also developed during the PERICLES project, to be able to encapsulate the extracted information together with its related files in a sheer curation scenario.
-
The weighted graphs described in chapter 6.3 could be implemented for suggesting information to be extracted based on the use cases.
This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no FP7- 601138 PERICLES.
<img src="https://github.com/pericles-project/pet/blob/master/wiki-images/PERICLES%20logo_black.jpg" width="200"/ align="right">