Skip to content

An LLM-powered framework to evaluate text content by extracting, matching, and comparing semantic triplets. Applied to German legal text analysis.

License

Notifications You must be signed in to change notification settings

TamasCsakvari/oie-llm-framework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Large Language Model-based Framework for Open Information Extraction, Triplet Matching, and Text Comparison

MIT License

Overview

oie-llm-framework is a comprehensive Python framework designed for deep semantic text comparison. It leverages Large Language Models (LLMs) to perform Open Information Extraction (OIE), converting unstructured text into structured subject-predicate-object triplets. The framework can then match these triplets against a set of key criteria, providing a granular and automated way to evaluate if a text contains specific information. This project was developed and evaluated on a German legal dataset to assess student answers against expert annotations.

Key Features

  • 🤖 LLM-Powered OIE: Extract semantic triplets from text using various models (OpenAI, Together, local vLLM).
  • 🎯 Triplet Matching: Intelligently match extracted triplets against predefined target information.
  • 📊 Comprehensive Evaluation: Built-in tools to calculate performance metrics (Precision, Recall, F1, MCC) against a ground truth.
  • ⚖️ Multiple Methods: Compare results against alternative approaches like end-to-end LLM evaluation, string matching, and rule-based extraction.
  • 🇩🇪 Use Case: Includes a German legal dataset for evaluating student answers.

Main Pipeline

The main pipeline is designed to automatically determine if a given input text contains key information, which is represented by a predefined set of target triplets. This process involves a two-step, LLM-driven approach: first, it performs OIE to generate candidate triplets from the text, and second, it uses an LLM to match these candidates against the target triplets. The final output is a decision for each target criterion, indicating whether the corresponding key content is present, thereby facilitating a structured and granular comparison of the text's contents.

German Legal Dataset

The data used in this project is German legal data with a case and student answers. Student answers are used as input texts and the target triplets are the key information that the student should have included in their answer. First, the system checks whether the student answer contains the key information. Then, the system answers are evaluated against expert annotations. Thus the results show the potential of the system to automatically evaluate student answers against expert annotations.

main_overview_flowchart.png

This framework provides components for:

  • Loading a LLM from the OpenAI API, Together API, or as a local vLLM model
  • Extracting subject-predicate-object triplets from input texts using the chosen LLM
  • Matching these triplets with target triplets (representing key contents of the text) with the chosen LLM
  • Evaluating the performance of the triplet matching against a (ground truth) reference matching
  • Also comparing text with some select other methods:
    • LLM end-to-end text matching (direct criterion matching without triplet extraction)
    • String matching (replacing the LLM-based triplet matching with a simple string matching approach)
    • Rule-based triplet extraction (replacing the LLM-based triplet extraction with a rule-based approach)

Installation

Create a conda environment and install dependencies:

conda create -n oie-llm-framework python=3.10 -y
conda activate oie-llm-framework
pip install -r requirements.txt
python -m spacy download de_core_news_sm

Quickstart

To run the existing example with the default settings:

conda activate oie-llm-framework
python -m src.main

This will:

  1. Load the configured dataset from the data directory
  2. Run triplet extraction and matching with several LLMs (gpt-4o, gpt-4o-mini, gpt-4.1-mini, Llama-3.3-70B, Llama-4-Maverick, DeepSeek-V3)
  3. Compare with alternative methods (end-to-end LLM, string matching, rule-based extraction)
  4. Generate evaluation metrics and save all outputs to the output directory

Project Structure

├── data/                       # Input data
│   ├── text_1.txt              # Text 1
│   ├── text_2.txt              # Text 2
│   ├── target_triplets.txt     # Reference triplets to match against
│   ├── target_criteria.txt     # Reference criteria to match against (used only for LLM end-to-end method)
│   └── reference_matching_matrix.csv  # Reference matching to evaluate matching against
├── output/                     # Final outputs
│   ├── triplet_extraction/     # Main method: triplet extraction results
│   ├── triplet_matching/       # Main method: triplet matching results
│   ├── matching_evaluation/    # Main method: matching evaluation results
│   ├── plots/                  # Plots
│   └── other_methods/          # Other methods results
│       ├── llm_end_to_end_method/ # LLM end-to-end method results
│       ├── string_matching_method/ # String matching method results
│       └── rule_based_triplet_extraction_method/ # Rule-based triplet extraction method results
├── src/                        # Source code
│   ├── __init__.py             # Package initialization
│   ├── config.py               # Configuration settings
│   ├── data_models.py          # Core data structures
│   ├── matching_evaluation.py  # Evaluation metrics and utilities
│   ├── llm_model.py            # LLM integration (OpenAI API, Together API, local vLLM)
│   ├── main.py                 # Main script
│   ├── other_methods/          # Other methods for comparison
│   │   ├── end_to_end_llm.py # LLM end-to-end method
│   │   ├── string_matching.py # String matching method
│   │   └── rule_based_extraction.py # Rule-based triplet extraction method
│   ├── triplet_extraction.py   # Triplet extraction functionality
│   ├── german_legal_case_utils.py    # German legal case evaluation dataset-specific utilities
│   └── triplet_matching.py     # Triplet matching functionality
├── requirements.txt            # Python dependencies
├── LICENSE                     # License
└── README.md                   # This file

Requirements

  • Python 3.10+
  • miniconda or anaconda for managing dependencies
  • OpenAI API key, or Together API key, or capability to use local vLLM models
  • Dependencies listed in requirements.txt including:
  torch==2.5.1
  transformers==4.48.3
  vllm==0.7.2
  nltk==3.8.1
  pandas==2.2.3
  fuzzywuzzy==0.18.0
  dataclasses-json==0.6.7
  python-dotenv
  spacy==3.8.5
  matplotlib==3.10.1

Usage and Code Examples

Note: In the following steps, only a single input text is used for simplicity. The main script takes multiple texts as an input.

Main Method

The main pipeline involves three steps: Triplet Extraction, Triplet Matching, and Evaluation.

1. Open Information Extraction (Triplet Extraction)
  1. Load text from the data directory as a string
  2. Process the text using TripletExtractor.extract_triplets_from_text() to extract triplets from sentences
  3. Receive a SentencesWithTripletsFromText object containing the text and extracted triplets per sentence
  4. Save the result as a txt file in the triplet_extraction directory
text_1 = "This is a test sentence. This is another test sentence. This is a third test sentence."
llm_model = LLM_Model.initialize_model(model_id="gpt-4o-mini-2024-07-18")  # or any supported model
extractor = TripletExtractor(extraction_method=llm_model)
sentences_with_triplets_from_text = extractor.extract_triplets_from_text(text_1)

# Save triplet extraction results
with open(f"output/triplet_extraction/{dataset_name}_{llm_model.model_id.replace('/', '__')}_example_text_sentences_with_extracted_triplets.txt", 'w', encoding='utf-8') as f:
    f.write(f"Showing a single example text with the sentences and their extracted triplets.\n")
    f.write(f"Text extracted with {llm_model.model_id}\n")
    f.write(sentences_with_triplets_from_text.to_string())
2. Triplet Matching
  1. Extract all triplets from the SentencesWithTripletsFromText object using get_list_of_triplets()
  2. Process candidate triplets through TripletMatcher.match_triplets() to match against target triplets Note: Target triplets can be manually defined, extracted from another text, or from a reference text
  3. Receive a TripletMatchesFromText object containing TripletMatch objects for each target triplet
# Define target triplets (these would typically be loaded from your dataset)
target_triplets = [
    Triplet(subject="example subject", predicate="example predicate", object="example object"),
    # Add more target triplets as needed
]

# Extract candidate triplets from the text
list_of_candidate_triplets_from_text = sentences_with_triplets_from_text.get_list_of_triplets()

# Initialize matcher with the LLM model
matcher = TripletMatcher(matching_method=llm_model)

# Match triplets (note: correct parameter order is target_triplets, candidate_triplets, text)
triplet_matches_for_text = matcher.match_triplets(target_triplets, list_of_candidate_triplets_from_text, text_1)

# Create matching dataframe
matching_dataframe = MatchingDataframe(matches_list=[triplet_matches_for_text])
  1. Save matching results in three formats:
    • JSON: Raw matching data for programmatic use
    • CSV: Tabular format showing which texts match which criteria
    • TXT: Human-readable formatted reports
# Save matching results as JSON
with open(f"output/triplet_matching/{dataset_name}_{llm_model.model_id.replace('/', '__')}_matched_triplets_for_each_attempt.json", 'w', encoding='utf-8') as f:
    json.dump([triplet_matches_for_text], f, cls=NumpyEncoder)

# Save matching dataframe as CSV
matching_dataframe.matching_df.to_csv(f"output/triplet_matching/{dataset_name}_{llm_model.model_id.replace('/', '__')}_matching_dataframe.csv")

# Save formatted matching report as TXT
list_of_matching_criteria = ["Example criterion 1", "Example criterion 2"]  # These would come from your dataset
matching_report = format_matching_list_and_criteria(triplet_matches_for_text, list_of_matching_criteria)
with open(f"output/triplet_matching/text_0/{dataset_name}_{llm_model.model_id.replace('/', '__')}_text_0_matching_list_and_criteria.txt", 'w', encoding='utf-8') as f:
    f.write(f"Matching list and criteria for text 0 with {llm_model.model_id}:\n")
    f.write(matching_report)
3. Matching Evaluation
  1. Compare matching results against a reference matching matrix (ground truth)
  2. Use the reference matrix to determine correct text-criteria mappings from human annotation
  3. Calculate performance metrics (precision, recall, F1-score, accuracy, MCC) from the comparison
# Load or create reference matching matrix (ground truth)
# This would typically be loaded from your dataset
reference_matching_df = pd.DataFrame(
    [[True, False], [False, True]],  # Example: 2 texts, 2 criteria
    columns=[0, 1],  # Column indices for criteria
    index=[0, 1]     # Row indices for texts
)

# Evaluate matching results against reference
evaluation_of_matching_dataframe = EvaluationOfMatchingDataframe.from_matching_dataframe(matching_dataframe, reference_matching_df)

# Calculate performance metrics
metrics = calculate_metrics_from_evaluated_matching_results_df(evaluation_of_matching_dataframe.evaluation_of_matching_df)

print(f"Performance metrics: {metrics}")

full_overview_flowchart.png

Other Method: LLM End-to-End Text Matching

LLM End-to-End Text Matching
  1. Initialize the LLM end-to-end method with your chosen LLM model
  2. Define your list of texts and evaluation criteria
  3. Create an empty matching dataframe to store results
  4. For each text-criterion pair, use the LLM to directly evaluate if they match
  5. Populate the dataframe with boolean values (1 for match, 0 for no match)
  6. Evaluate results against reference matching matrix and calculate metrics
# Initialize end-to-end method
llm_end_to_end_method = LLMEndToEndTextMatchingMethod(llm_model)

# Define texts and criteria
list_of_texts = [text_1]  # Your list of texts to evaluate
list_of_matching_criteria = ["Example criterion 1", "Example criterion 2"]  # Your evaluation criteria

# Create empty dataframe and populate it
llm_end_to_end_matching_df = MatchingDataframe(
    matching_df=pd.DataFrame(columns=list_of_matching_criteria, index=list_of_texts)
)

# Populate the dataframe by matching each text against each criterion
for text in list_of_texts:
    for criterion in list_of_matching_criteria:
        if llm_end_to_end_method.match_text_with_criterion(text, criterion):
            llm_end_to_end_matching_df.matching_df.loc[text, criterion] = 1
        else:
            llm_end_to_end_matching_df.matching_df.loc[text, criterion] = 0

# Evaluate results
evaluation_of_llm_end_to_end_matching_dataframe = EvaluationOfMatchingDataframe.from_matching_dataframe(
    llm_end_to_end_matching_df, reference_matching_df
)
metrics = calculate_metrics_from_evaluated_matching_results_df(
    evaluation_of_llm_end_to_end_matching_dataframe.evaluation_of_matching_df
)

Other Method: String Matching

String Matching
  1. Use LLM-based triplet extraction to extract triplets from your texts (same as main method)
  2. Initialize the string matching method instead of LLM-based triplet matching
  3. For each text, extract candidate triplets from the SentencesWithTripletsFromText object
  4. Apply string similarity (fuzzy matching) to match candidate triplets against target triplets
  5. Create matching dataframe from the string-based matching results
  6. Evaluate results against reference matching matrix and calculate metrics
# Use triplets extracted from LLM-based method (or extract new ones)
list_of_sentences_with_triplets_from_texts = [sentences_with_triplets_from_text]  # From previous example

# Initialize string matching method
string_matching_method = StringMatchingMethod()

# Process each text
list_of_TripletMatchesForText = []
for sentences_with_triplets_from_text in list_of_sentences_with_triplets_from_texts:
    candidate_triplets_for_text = sentences_with_triplets_from_text.get_list_of_triplets()
    triplet_matches_from_text = string_matching_method.match_triplets(
        target_triplets,  # Note: target_triplets comes first
        candidate_triplets_for_text,
        sentences_with_triplets_from_text.text
    )
    list_of_TripletMatchesForText.append(triplet_matches_from_text)

# Create dataframe and evaluate
matching_dataframe = MatchingDataframe(matches_list=list_of_TripletMatchesForText)
evaluation_of_matching_dataframe = EvaluationOfMatchingDataframe.from_matching_dataframe(
    matching_dataframe, reference_matching_df
)
metrics = calculate_metrics_from_evaluated_matching_results_df(
    evaluation_of_matching_dataframe.evaluation_of_matching_df
)

Other Method: Rule-Based Triplet Extraction

Rule-Based Triplet Extraction
  1. Initialize the rule-based extraction method using spaCy dependency parsing patterns
  2. Create a TripletExtractor with the rule-based method instead of LLM-based extraction
  3. Extract triplets from each text using predefined linguistic patterns and dependency trees
  4. Use LLM-based TripletMatcher for the triplet matching step (same as main method)
  5. Match extracted triplets against target triplets using the LLM matcher
  6. Create matching dataframe and evaluate against reference matrix to calculate metrics
# Initialize rule-based extraction method
rule_based_extraction_method = RuleBasedTripletExtractionMethod(RuleSets.patterns_general)
extractor = TripletExtractor(extraction_method=rule_based_extraction_method)

# Extract triplets from texts using rule-based method
list_of_sentences_with_triplets_from_texts = []
for text in list_of_texts:
    sentences_with_triplets_from_text = extractor.extract_triplets_from_text(text)
    list_of_sentences_with_triplets_from_texts.append(sentences_with_triplets_from_text)

# Use the LLM-based matcher for triplet matching
triplet_matching_method = TripletMatcher(matching_method=llm_model)
list_of_TripletMatchesForText = []
for sentences_with_triplets_from_text in list_of_sentences_with_triplets_from_texts:
    triplet_matches_from_text = triplet_matching_method.match_triplets(
        target_triplets, 
        sentences_with_triplets_from_text.get_list_of_triplets(), 
        sentences_with_triplets_from_text.text
    )
    list_of_TripletMatchesForText.append(triplet_matches_from_text)

# Evaluate results
matching_dataframe = MatchingDataframe(matches_list=list_of_TripletMatchesForText)
evaluation_of_matching_dataframe = EvaluationOfMatchingDataframe.from_matching_dataframe(
    matching_dataframe, reference_matching_df
)
metrics = calculate_metrics_from_evaluated_matching_results_df(
    evaluation_of_matching_dataframe.evaluation_of_matching_df
)

Terminology

  • input text: the text to be processed
  • extracted triplets: the triplets extracted from the input text, and used as candidate triplets for matching
  • candidate triplets: the triplets extracted from the input text, to be matched against the target triplets
  • target criterion: a criterion, ie. a key content of the text that the input text needs to be compared against
  • target triplets: manually defined triplets that each aim to represent a target criterion
  • matched triplets: a combination of a target triplet and a subset of the candidate triplets that match that target triplet
  • predicted matching matrix: a matrix of boolean values, where the rows are the input texts and the columns are the target triplets, and the value at the intersection of a row and a column is True if the target triplet is matched to the input text by the matching algorithm, False otherwise
  • reference matching matrix: a matrix of boolean values, representing a human-annotated mapping of input texts to target triplets, used to evaluate the performance of the matching algorithm

Core Components

Data Models

  • Triplet: Represents a subject-predicate-object triplet
    • subject string: The subject of the triplet
    • predicate string: The predicate of the triplet
    • object string: The object of the triplet
  • SentenceWithTriplets: Represents a sentence with its extracted triplets
    • sentence string: The sentence
    • triplets list[Triplet]: The list of triplets extracted from the sentence
  • SentencesWithTripletsFromText: Represents a text and its sentences with their extracted triplets
    • text string: The text
    • sentences list[SentenceWithTriplets]: The sentences with their extracted triplets
  • TripletMatch: Contains a target triplet and its matching candidate triplets
    • target_triplet Triplet: The target triplet
    • candidate_triplets list[Triplet]: The list of candidate triplets that match the target triplet
  • TripletMatchesFromText: Collection of triplet matches for a given text. Attributes:
    • text string: The text
    • triplet_matches list[TripletMatch]: The list of triplet matches for the given text

Main Method Components

  • TripletExtractor: Extracts triplets from text using LLMs.
    • Important attributes:
      • extraction_method: The method used to extract triplets from text, expected to implement the extract_triplets(sentence: str) -> str method, can be LLM_Model or RuleBasedTripletExtractionMethod
    • Important methods:
      • extract_triplets_from_text(text: str) -> SentencesWithTripletsFromText: Extracts triplets from a given text
      • extract_triplets_from_sentences(sentences: list[str]) -> list[SentenceWithTriplets]: Extracts triplets from a list of sentences
      • extract_triplets_from_sentence(sentence: str) -> SentenceWithTriplets: Extracts triplets from a given sentence
  • TripletMatcher: Matches extracted triplets against target triplets.
    • Important attributes:
      • matching_method: The method used to match triplets, expected to implement the match_triplets(target_triplet: Triplet, candidate_triplets: list[Triplet]) -> str method, can be LLM_Model or StringMatchingMethod
    • Important methods:
      • match_triplets(target_triplets: list[Triplet], candidate_triplets: list[Triplet], text: str) -> TripletMatchesFromText: Matches a list of target triplets against a list of candidate triplets, for each target triplet, the whole list of candidate triplets is checked against it
      • match_target_triplet_with_candidate_triplets(target_triplet: Triplet, candidate_triplets: list[Triplet]) -> TripletMatch: Matches a target triplet against a list of candidate triplets
  • LLM_Model: Interface for local vLLM models, OpenAI API, and Together API models.
    • Important attributes:
      • model_id: The ID of the model to use
      • temperature, max_tokens: hyperparameters defined in src/config.py
      • extract_triplets_prompt: The prompt to use for triplet extraction
      • match_triplets_prompt: The prompt to use for triplet matching
    • Important methods:
      • _generate_response(prompt: str) -> str: Generates a response from the LLM model
      • extract_triplets(sentence: str) -> str: Extracts triplets from a given sentence, outputs an LLM response, to the extract_triplets_prompt defined in src/config.py
      • match_triplets(target_triplet: Triplet, candidate_triplets: list[Triplet]) -> str: Matches a target triplet against a list of candidate triplets, outputs an LLM response, to the match_triplets_prompt defined in src/config.py

Other Methods

  • LLMEndToEndTextMatchingMethod: Direct criterion matching without triplet extraction. Important methods:
    • match_text_with_criterion(text: str, criterion: str) -> bool: Matches a given text against a given criterion
  • StringMatchingMethod: Simple string similarity-based matching for triplet comparison. Important methods:
    • match_triplets(target_triplets: List[Triplet], candidate_triplets: List[Triplet], text: str) -> TripletMatchesFromText: Matches target triplets against candidate triplets
  • RuleBasedTripletExtractionMethod: Uses spaCy rules to extract triplets. Important methods:
    • extract_triplets(sentence: str) -> str: Extracts triplets from a given sentence

Outputs

Detailed list of outputs

Triplet extraction:

  • output/triplet_extraction/{dataset_name}_{model_id}_texts_with_sentences_and_extracted_triplets.txt: (one for each triplet extraction method) a txt file containing the triplets extracted from each sentence of each input text, with a given method

Triplet matching:

  • output/triplet_matching/{dataset_name}_{model_id}_matched_triplets_for_each_attempt.json: (one for each triplet matching method) a json file containing the list of texts, for each text, the list of target triplets and for each, the list of their matching candidate triplets for a given method
  • output/triplet_matching/{dataset_name}_{model_id}_matching_dataframe.csv: (one for each triplet matching method) a csv file containing the matching results for the list of target triplets and their matching candidate triplets for each text with a given method
  • output/triplet_matching/text_{text_id}/{dataset_name}_{model_id}_text_{text_id}_matching_list_and_criteria.txt: (multiple txt files, one per text per method) formatted reports containing a list with each entry a criterion and the target triplet that represents it, the answer to whether it was fulfilled and if it was fulfilled, the list of matching candidate triplets that were found for that criterion

Matching evaluation:

  • output/matching_evaluation/{dataset_name}_{model_id}_evaluation_of_matching.csv: (one for each triplet matching method) a csv file containing the matching evaluation, which compares the matching results to the reference matching and determines true positives, false positives, true negatives, false negatives, showing the result in a table, with texts as rows and target triplets as columns
  • output/matching_evaluation/{dataset_name}_{model_id}_metrics.json: (one for each triplet matching method) a json file containing the performance metrics calculated from the matching evaluation, including precision, recall, f1-score, accuracy, MCC
  • output/matching_evaluation/detailed_reports/{dataset_name}_{model_id}_detailed_report.txt: (one for each triplet matching method) a comprehensive detailed report showing the matching process and evaluation results
  • output/matching_evaluation/{dataset_name}_all_metrics_from_all_methods.csv: a csv file containing all metrics for all methods

Plots:

  • output/plots/{dataset_name}_mcc_scores.png and output/plots/{dataset_name}_mcc_scores.pdf: horizontal bar charts comparing MCC scores across all methods

Other Methods Outputs:

  • output/other_methods/llm_end_to_end_method/{dataset_name}_{model_id}_llm_end_to_end_matching.csv: direct text-criterion matching results
  • output/other_methods/llm_end_to_end_method/{dataset_name}_{model_id}_evaluation_of_llm_end_to_end_matching.csv: evaluation of end-to-end matching
  • output/other_methods/llm_end_to_end_method/{dataset_name}_{model_id}_llm_end_to_end_metrics.json: metrics for end-to-end method
  • output/other_methods/string_matching_method/{dataset_name}_string_matching_output.txt: detailed string matching logs and scores
  • output/other_methods/string_matching_method/{dataset_name}_metrics_with_{model_id}.json: metrics for string matching method
  • output/other_methods/rule_based_triplet_extraction_method/{dataset_name}_rule_based_triplet_extraction_texts_with_sentences_and_extracted_triplets.txt: rule-based extraction results
  • output/other_methods/rule_based_triplet_extraction_method/{dataset_name}_rule_based_triplet_extraction_metrics_with_{model_id}.json: metrics for rule-based extraction method
  • output/other_methods/rule_based_triplet_extraction_method/dependency_trees/text_{text_id}/: dependency parsing trees and graphs for each text

Configuration

The project uses a configuration system defined in src/config.py with the following components:

ModelConfig

  • model_id: ID of the model to use
  • max_tokens: Maximum tokens to generate (default: 512)
  • temperature: Sampling temperature (default: 1.0)
  • extract_triplets_prompt: Prompt template for triplet extraction
  • match_triplets_prompt: Prompt template for triplet matching

ProjectConfig

  • data_dir: Directory for case files (default: "data")
  • output_dir: Directory for final outputs (default: "output")
  • default_language: Default language (default: "de")
  • dataset_name: Name of the dataset (default: "case_2")

Model Support

The project supports multiple model providers:

  1. OpenAI API: Used when API key is set and the model is available in OpenAI's lineup
  2. Together API: Used when API key is set and the model is available in Together's lineup
  3. Local vLLM: Used as fallback when API options aren't available

The system automatically selects the appropriate provider based on model and API key availability.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Chart with detailed example for the main method

detailed_flowchart.png

About

An LLM-powered framework to evaluate text content by extracting, matching, and comparing semantic triplets. Applied to German legal text analysis.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages