Skip to content

PromG-dev/promg-inference

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Using Graph Databases to Infer Missing Case Identifiers in Event Data

‼️ This project does not use PromG yet. This update will be released as soon as possible.

Description

This repository collects queries for modeling and importing incomplete event data as Event Knowledge Graphs using the Labeled Property Graph data model of graph databases. Missing case identifiers are inferred using context knowledge of activities and its locations. All scripts and queries are licensed under LGPL v3.0, see LICENSE. Copyright information is provided within each Project.

Clone the repo and make sure to grab the submodules

Clone repository

To clone git clone https://github.com/PromG-dev/promg-inference

Submodule: EKG Construction

The EKG construction is specified in ekg-creator and is a submodule. So you should run git submodule update --init --recursive.

Requirements

Neo4j

Install the neo4j-python-driver

pip install neo4j OR conda install -c conda-forge neo4j-python-driver

Install Neo4j:

Other packages

  • numpy
  • pandas
  • tabulate
  • tqdm

Hint: you can install all packages at once by running pip install neo4j numpy pandas tabulate tqdm.

Get started

Create a new graph database

  • The scripts in this release assume password "12345678".
  • The scripts assume the server to be available at the default URL bolt://localhost:7687
    • You can modify this also in the script.
  • ensure to allocate enough memory to your database, advised: dbms.memory.heap.max_size=5G, this can be changed by adjusting the settings. See Modifying settings for the DBMS
  • the script expects the Neo4j APOC library to be installed as a plugin, see https://neo4j.com/labs/apoc/

How to use

For data import & inference

  1. start the Neo4j server
  2. run main.py

Projects

The following projects are part of this repository

Missing Case Identifiers Inference

Method to infer missing case identifiers in event data by exploiting knowledge about the activities and their locations.

semantic header (json files)

First version for semantic header for system/event knowledge graphs: https://multiprocessmining.org/2022/10/26/data-storage-vs-data-semantics-for-object-centric-event-data/

event knowledge graphs

Data model and generic query templates for translating and integrating a set of related CSV event logs into single event graph over multiple behavioral dimensions, stored as labeled property graph in Neo4J. See csv_to_eventgraph_neo4j/README.txt

Publications:

Data sets

We provide data and scripts for the running box process example

Datasets

  • event_data.csv - contains the data of the running example for all four different variants
  • activity_records.csv - contains context knowledge about the activities of the box process
  • location_records - contains context knowledge about locations and information about the activities that happen at these locations to create the [:AT] relation.

There is no separate file to create the [:AT] relationship between Activities and Locations. Instead, we add the activities directly to the location records and for each unique primary key (name, partOf) we collect the activity names in a list and store this list as attribute. Then we use the activity names as foreign keys to create the [:AT] relation.

JSON files

  • BoxProcess.json - a description of which Entities (Box), Relations (:PART_OF, :CORR, :AT, :IS), Classes need to be created: The main script uses this information to construct the EKG.
  • BoxProccess_DS.json - a description of the different data sets. It describes which labels the records should receive (e.g. :Event, :Location, :Activity) and what properties records have

Scripts

Main script

There is one script that creates the Event/System knowledge graph: ekg_creator/main.py and several other scripts that support this main script:

Data_managers

  • data_managers/datastructures.py --> transforms the JSON file describing the different datasets into a class + additional methods
  • data_managers/semantic_header.py --> transforms the JSON file describing the semantic header into a class + additional methods
  • data_managers/interpreters.py --> Class that contains information about in what query language the semantic header and data structures should be interpreter

Database_managers

  • database_managers/authentication.py --> class containing the credentials to create connection to database. Local credentials are includes. In case you want to create a remote connection, add the following piece of code to a (gitignored) file.
remote = Credentials(
    uri="[your_uri]",
    user="neo4j",
    password="[your_password]"
)
  • database_managers/db_connection.py --> class responsible for making the connection to the database and to communicate with the database
  • database_managers/EventKnowledgeGraph.py --> class responsible for making (changes to) the EKG and to request data from the EKG. Makes use of several modules.

EKG_Modules

  • ekg_modules/db_management.py --> general module to manage the database
  • ekg_modules/data_importer.py --> imports the data stored in the records into the EKG
  • ekg_modules/ekg_builder_semantic_header.py --> creates the required nodes and relations as specified in the semantic header
  • ekg_modules/inference_engine.py --> module responsible for inferring missing information

CypherQueries

Contains repeatable pieces of Cypher Queries for all necessary parts.

  • cypher_queries/query_translators --> translate semantic header and data structures into Cypher
  • cypher_queries/query_library --> contains all cypher queries for the EKG modules

About

Inference mechanism for missing case identifiers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages