Skip to content

isa-group/llms-ocel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ” OCEL Log Filtering Application

A powerful web application for filtering Object-Centric Event Logs using natural language processing. Apply complex filters with simple natural language commands and visualize process mining data interactively.

πŸ“š Table of Contents

πŸ“‘ Overview

The OCEL Log Filtering application bridges the gap between complex process mining data and natural language interaction. It leverages Large Language Models (LLMs) to interpret user queries and automatically apply appropriate filters to extract insights from Object-Centric Event Logs (OCEL).

OCEL is a standard format for storing event logs that capture process-oriented data with complex object interactions. Unlike traditional event logs that focus on a single case notion, OCEL allows for multiple object types and their relationships, enabling more comprehensive process mining analysis.

πŸš€ Key Features

  • Natural Language Filtering: Apply complex filters by describing them in plain English
  • Interactive Visualization: Explore event-object relationships through dynamic, interactive graphs
  • Multiple Log Formats: Support for OCEL logs in JSON (OCEL 2.0) and XML (OCEL 2.0) formats
  • Session-Based Storage: Persistent user sessions with automatic resource management
  • Filter Evaluation: Collection of user feedback to improve filtering effectiveness
  • Hierarchical Log Organization: View relationships between original and filtered logs
  • Real-time Statistics: Detailed metrics on events, objects, and relationships

πŸ› οΈ Technology Stack

Backend

  • Python (^3.9): Core programming language
  • FastAPI (^0.109.0): Modern, high-performance web framework
  • PM4Py (^2.7.0): Process Mining library with OCEL support
  • Pydantic (^2.6.0): Data validation and settings management
  • OpenAI API (^1.65.5): For natural language processing of filter requests
  • Uvicorn: ASGI server for serving FastAPI applications
  • Asyncio: For asynchronous operations and performance

Frontend

  • Astro (^5.5.2): Web framework with view transitions support
  • React (^19.0.0): For interactive UI components
  • Tailwind CSS (^3.4.17): Utility-first CSS framework
  • ReactFlow (^11.11.4): For interactive graph visualizations
  • Axios (^1.8.3): Promise-based HTTP client
  • SweetAlert2 (^11.6.13): For beautiful, responsive alert dialogs

Deployment

  • Docker and Docker Compose: For containerization and orchestration
  • Nginx: As a reverse proxy for routing requests
  • Environment Variables: For configuration management

🚦 Getting Started

Prerequisites

  • Docker and Docker Compose installed
  • OpenAI API key for LLM functionality
  • 250MB+ of available storage space

Installation

  1. Clone the repository:
git clone https://github.com/isa-group/llms-ocel.git
cd llms-ocel
  1. Configure the environment variables:
# Copy the example environment file and customize it
cp .env.example .env

# Edit the .env file with your settings (especially the OpenAI API key)
nano .env
  1. Start the application with Docker Compose:
docker-compose up -d
  1. Access the application:

Configuration

The application is highly configurable through environment variables in the .env file. Here are the key configuration options:

Server Settings

Variable Description Default
HOST Host IP to bind the server 0.0.0.0
PORT Port for the backend server 8000
DEBUG Enable debug mode False
CORS_ORIGINS Comma-separated list of allowed origins http://localhost:80,...

LLM Configuration

Variable Description Default
LLM_API_KEY OpenAI API Key (required)
LLM_MODEL LLM model to use gpt-3.5-turbo
LLM_TEMPERATURE Temperature for LLM responses (0.0-1.0) 0.1
LLM_MAX_TOKENS Maximum tokens for LLM responses 1000

Cache and Session Settings

Variable Description Default
CACHE_TTL Log time-to-live in seconds for each log 900 (15 minutes)
CACHE_MAXSIZE Maximum number of logs in cache 200
SESSION_TTL_SECONDS Backend session duration in seconds 86400 (24 hours)
SESSION_SIZE_LIMIT Max storage per session in bytes 262144000 (250 MB)
SESSION_LOG_LIMIT Maximum logs per session 20

Export Settings

Variable Description Default
EXPORT_TIMEOUT Timeout for export operations in seconds 30

Frontend Settings

Variable Description Default
API_URL URL for the backend API (used by frontend) http://backend:8000

To apply configuration changes, update the .env file and restart the application using docker-compose down followed by docker-compose up -d.

πŸ“‹ Usage Guide

Uploading Logs

  1. From the dashboard, click "Upload New Log" or navigate to the upload section
  2. Click "Upload" to process the file
  3. Review the log statistics displayed after successful upload

Applying Filters

  1. Select a log from the list or continue from your upload

  2. Navigate to the "Filter" tab

  3. Enter a natural language description of the filter you want to apply

    Examples: "Show me orders created in January 2023" or "Filter events with activity equals pay order"

  4. Click "Apply Filter" to process your request

  5. Review the filter results and provide feedback on filter accuracy

Visualizing Results

  1. Select any log from your list
  2. Navigate to the "Visualization" tab
  3. Interact with the graph representation:
    • Zoom and pan to explore the data
    • Toggle visibility of different object types
    • View relationships between events and objects
    • Adjust the visualization settings for clarity

Exporting Results

  1. From the log details view, click the "Download" button
  2. The file will be downloaded to your device

πŸ”§ Filter Types

The application supports various filter strategies that can be applied through natural language:

Filter Type Description Example
Activity Filters Filter events based on their activity values "Show only events with activity 'create order'"
Timestamp Filters Filter events based on date ranges "Show events between January 1st and March 31st, 2024"
Object Type Filters Filter based on object types "Only include objects of type 'orders'"
Activity-Type Matching Filter by specific activity and object type combinations "Show 'create package' activities only for 'packages' objects"
Object Count Filters Filter events based on number of related objects "Show events related to at least 3 items"
Lifecycle Start Events Show only first events in object lifecycles "Show only the first events for each order object"
Lifecycle End Events Show only last events in object lifecycles "Show only the last events for each order object"
Chain Filters Combine multiple filters for complex queries "Show orders from January 2024 with more than 2 items"

πŸ“š API Documentation

The API documentation is automatically generated and available at:

πŸ” Session Management

The application uses a multi-layer session-based approach to manage user resources:

Backend Session Storage:

  • Sessions persist in the backend cache for 24 hours by default
  • Each session has a storage limit (default: 250MB)
  • Logs automatically expire after 15 minutes of inactivity
  • The application provides notifications when logs are about to expire

Frontend Session Management:

  • The browser session cookie remains valid for 2 hours
  • After this period, you may need to refresh the page to restore the session
  • The session ID is maintained across page refreshes within this timeframe

These time limits are designed to balance resource usage with user convenience:

  • The 15-minute log expiration ensures efficient resource allocation
  • The 2-hour frontend session matches typical working sessions
  • The 24-hour backend persistence allows returning to your work the next day

You'll receive notifications before logs expire, giving you time to save important work by downloading results or refreshing log access.

πŸ“Š Resource Management

To ensure optimal performance and fair resource allocation:

  • Monitor your session storage usage in the top-right corner
  • Delete unused logs to free up session storage space
  • Download important results to your local system for long-term storage
  • Use filter chains to create efficient workflows without intermediate logs
  • Pay attention to expiration notifications for logs you're actively working with

Key resource limits (configurable in .env):

  • Maximum logs per session: 20 by default
  • Storage limit per session: 250MB by default
  • Maximum number of logs in cache: 200 by default

πŸ—£οΈ Evaluation Feedback

Your feedback helps improve the filtering mechanism:

  1. After applying a filter, you'll be prompted to evaluate its effectiveness
  2. Rate how accurately the filter interpreted your natural language description
  3. Provide additional comments if needed
  4. Your feedback is stored and used to enhance future filtering accuracy

πŸ“ File Format Support

The application supports the following OCEL formats:

  • JSON (OCEL 2.0): Default format with full support for all features
  • XML (OCEL 2.0): Alternative format with equivalent functionality

❓ Troubleshooting

Session Issues

  • If you encounter session errors, refresh the page to generate a new session
  • Clear your browser cache if persistent session problems occur
  • Note that frontend sessions expire after 2 hours, while backend data persists for 24 hours

Upload Problems

  • Ensure your OCEL file follows the proper standard format
  • Large files may take longer to process; please be patient
  • Check that your file has a valid .json or .xml extension
  • Verify that your session hasn't reached the maximum log limit (default: 20)

Filter Application Errors

  • Try to be more specific in your filter description
  • Avoid complex or ambiguous language
  • Review the available filter types and examples for guidance
  • Check that your LLM_API_KEY is valid in the .env file

Visualization Performance

  • Large logs will be automatically sampled for visualization
  • Adjust the maximum relations setting for better performance
  • Toggle visibility of specific object types to reduce complexity

Log Expiration

  • Logs automatically expire after 15 minutes of inactivity
  • You'll receive notifications before logs expire
  • Access a log to reset its expiration timer

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

About

Leveraging Large Language Models for Object-Centric Event Log Filtering

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •