Large Language Model-based Framework for Open Information Extraction, Triplet Matching, and Text Comparison
oie-llm-framework is a comprehensive Python framework designed for deep semantic text comparison. It leverages Large Language Models (LLMs) to perform Open Information Extraction (OIE), converting unstructured text into structured subject-predicate-object triplets. The framework can then match these triplets against a set of key criteria, providing a granular and automated way to evaluate if a text contains specific information. This project was developed and evaluated on a German legal dataset to assess student answers against expert annotations.
- 🤖 LLM-Powered OIE: Extract semantic triplets from text using various models (OpenAI, Together, local vLLM).
- 🎯 Triplet Matching: Intelligently match extracted triplets against predefined target information.
- 📊 Comprehensive Evaluation: Built-in tools to calculate performance metrics (Precision, Recall, F1, MCC) against a ground truth.
- ⚖️ Multiple Methods: Compare results against alternative approaches like end-to-end LLM evaluation, string matching, and rule-based extraction.
- 🇩🇪 Use Case: Includes a German legal dataset for evaluating student answers.
The main pipeline is designed to automatically determine if a given input text contains key information, which is represented by a predefined set of target triplets. This process involves a two-step, LLM-driven approach: first, it performs OIE to generate candidate triplets from the text, and second, it uses an LLM to match these candidates against the target triplets. The final output is a decision for each target criterion, indicating whether the corresponding key content is present, thereby facilitating a structured and granular comparison of the text's contents.
The data used in this project is German legal data with a case and student answers. Student answers are used as input texts and the target triplets are the key information that the student should have included in their answer. First, the system checks whether the student answer contains the key information. Then, the system answers are evaluated against expert annotations. Thus the results show the potential of the system to automatically evaluate student answers against expert annotations.
This framework provides components for:
- Loading a LLM from the OpenAI API, Together API, or as a local vLLM model
- Extracting subject-predicate-object triplets from input texts using the chosen LLM
- Matching these triplets with target triplets (representing key contents of the text) with the chosen LLM
- Evaluating the performance of the triplet matching against a (ground truth) reference matching
- Also comparing text with some select other methods:
- LLM end-to-end text matching (direct criterion matching without triplet extraction)
- String matching (replacing the LLM-based triplet matching with a simple string matching approach)
- Rule-based triplet extraction (replacing the LLM-based triplet extraction with a rule-based approach)
Create a conda environment and install dependencies:
conda create -n oie-llm-framework python=3.10 -y
conda activate oie-llm-framework
pip install -r requirements.txt
python -m spacy download de_core_news_sm
To run the existing example with the default settings:
conda activate oie-llm-framework
python -m src.main
This will:
- Load the configured dataset from the data directory
- Run triplet extraction and matching with several LLMs (gpt-4o, gpt-4o-mini, gpt-4.1-mini, Llama-3.3-70B, Llama-4-Maverick, DeepSeek-V3)
- Compare with alternative methods (end-to-end LLM, string matching, rule-based extraction)
- Generate evaluation metrics and save all outputs to the output directory
├── data/ # Input data
│ ├── text_1.txt # Text 1
│ ├── text_2.txt # Text 2
│ ├── target_triplets.txt # Reference triplets to match against
│ ├── target_criteria.txt # Reference criteria to match against (used only for LLM end-to-end method)
│ └── reference_matching_matrix.csv # Reference matching to evaluate matching against
├── output/ # Final outputs
│ ├── triplet_extraction/ # Main method: triplet extraction results
│ ├── triplet_matching/ # Main method: triplet matching results
│ ├── matching_evaluation/ # Main method: matching evaluation results
│ ├── plots/ # Plots
│ └── other_methods/ # Other methods results
│ ├── llm_end_to_end_method/ # LLM end-to-end method results
│ ├── string_matching_method/ # String matching method results
│ └── rule_based_triplet_extraction_method/ # Rule-based triplet extraction method results
├── src/ # Source code
│ ├── __init__.py # Package initialization
│ ├── config.py # Configuration settings
│ ├── data_models.py # Core data structures
│ ├── matching_evaluation.py # Evaluation metrics and utilities
│ ├── llm_model.py # LLM integration (OpenAI API, Together API, local vLLM)
│ ├── main.py # Main script
│ ├── other_methods/ # Other methods for comparison
│ │ ├── end_to_end_llm.py # LLM end-to-end method
│ │ ├── string_matching.py # String matching method
│ │ └── rule_based_extraction.py # Rule-based triplet extraction method
│ ├── triplet_extraction.py # Triplet extraction functionality
│ ├── german_legal_case_utils.py # German legal case evaluation dataset-specific utilities
│ └── triplet_matching.py # Triplet matching functionality
├── requirements.txt # Python dependencies
├── LICENSE # License
└── README.md # This file
- Python 3.10+
- miniconda or anaconda for managing dependencies
- OpenAI API key, or Together API key, or capability to use local vLLM models
- Dependencies listed in
requirements.txt
including:
torch==2.5.1
transformers==4.48.3
vllm==0.7.2
nltk==3.8.1
pandas==2.2.3
fuzzywuzzy==0.18.0
dataclasses-json==0.6.7
python-dotenv
spacy==3.8.5
matplotlib==3.10.1
Note: In the following steps, only a single input text is used for simplicity. The main script takes multiple texts as an input.
The main pipeline involves three steps: Triplet Extraction, Triplet Matching, and Evaluation.
1. Open Information Extraction (Triplet Extraction)
- Load text from the data directory as a string
- Process the text using
TripletExtractor.extract_triplets_from_text()
to extract triplets from sentences - Receive a
SentencesWithTripletsFromText
object containing the text and extracted triplets per sentence - Save the result as a txt file in the
triplet_extraction
directory
text_1 = "This is a test sentence. This is another test sentence. This is a third test sentence."
llm_model = LLM_Model.initialize_model(model_id="gpt-4o-mini-2024-07-18") # or any supported model
extractor = TripletExtractor(extraction_method=llm_model)
sentences_with_triplets_from_text = extractor.extract_triplets_from_text(text_1)
# Save triplet extraction results
with open(f"output/triplet_extraction/{dataset_name}_{llm_model.model_id.replace('/', '__')}_example_text_sentences_with_extracted_triplets.txt", 'w', encoding='utf-8') as f:
f.write(f"Showing a single example text with the sentences and their extracted triplets.\n")
f.write(f"Text extracted with {llm_model.model_id}\n")
f.write(sentences_with_triplets_from_text.to_string())
2. Triplet Matching
- Extract all triplets from the
SentencesWithTripletsFromText
object usingget_list_of_triplets()
- Process candidate triplets through
TripletMatcher.match_triplets()
to match against target triplets Note: Target triplets can be manually defined, extracted from another text, or from a reference text - Receive a
TripletMatchesFromText
object containingTripletMatch
objects for each target triplet
# Define target triplets (these would typically be loaded from your dataset)
target_triplets = [
Triplet(subject="example subject", predicate="example predicate", object="example object"),
# Add more target triplets as needed
]
# Extract candidate triplets from the text
list_of_candidate_triplets_from_text = sentences_with_triplets_from_text.get_list_of_triplets()
# Initialize matcher with the LLM model
matcher = TripletMatcher(matching_method=llm_model)
# Match triplets (note: correct parameter order is target_triplets, candidate_triplets, text)
triplet_matches_for_text = matcher.match_triplets(target_triplets, list_of_candidate_triplets_from_text, text_1)
# Create matching dataframe
matching_dataframe = MatchingDataframe(matches_list=[triplet_matches_for_text])
- Save matching results in three formats:
- JSON: Raw matching data for programmatic use
- CSV: Tabular format showing which texts match which criteria
- TXT: Human-readable formatted reports
# Save matching results as JSON
with open(f"output/triplet_matching/{dataset_name}_{llm_model.model_id.replace('/', '__')}_matched_triplets_for_each_attempt.json", 'w', encoding='utf-8') as f:
json.dump([triplet_matches_for_text], f, cls=NumpyEncoder)
# Save matching dataframe as CSV
matching_dataframe.matching_df.to_csv(f"output/triplet_matching/{dataset_name}_{llm_model.model_id.replace('/', '__')}_matching_dataframe.csv")
# Save formatted matching report as TXT
list_of_matching_criteria = ["Example criterion 1", "Example criterion 2"] # These would come from your dataset
matching_report = format_matching_list_and_criteria(triplet_matches_for_text, list_of_matching_criteria)
with open(f"output/triplet_matching/text_0/{dataset_name}_{llm_model.model_id.replace('/', '__')}_text_0_matching_list_and_criteria.txt", 'w', encoding='utf-8') as f:
f.write(f"Matching list and criteria for text 0 with {llm_model.model_id}:\n")
f.write(matching_report)
3. Matching Evaluation
- Compare matching results against a reference matching matrix (ground truth)
- Use the reference matrix to determine correct text-criteria mappings from human annotation
- Calculate performance metrics (precision, recall, F1-score, accuracy, MCC) from the comparison
# Load or create reference matching matrix (ground truth)
# This would typically be loaded from your dataset
reference_matching_df = pd.DataFrame(
[[True, False], [False, True]], # Example: 2 texts, 2 criteria
columns=[0, 1], # Column indices for criteria
index=[0, 1] # Row indices for texts
)
# Evaluate matching results against reference
evaluation_of_matching_dataframe = EvaluationOfMatchingDataframe.from_matching_dataframe(matching_dataframe, reference_matching_df)
# Calculate performance metrics
metrics = calculate_metrics_from_evaluated_matching_results_df(evaluation_of_matching_dataframe.evaluation_of_matching_df)
print(f"Performance metrics: {metrics}")
LLM End-to-End Text Matching
- Initialize the LLM end-to-end method with your chosen LLM model
- Define your list of texts and evaluation criteria
- Create an empty matching dataframe to store results
- For each text-criterion pair, use the LLM to directly evaluate if they match
- Populate the dataframe with boolean values (1 for match, 0 for no match)
- Evaluate results against reference matching matrix and calculate metrics
# Initialize end-to-end method
llm_end_to_end_method = LLMEndToEndTextMatchingMethod(llm_model)
# Define texts and criteria
list_of_texts = [text_1] # Your list of texts to evaluate
list_of_matching_criteria = ["Example criterion 1", "Example criterion 2"] # Your evaluation criteria
# Create empty dataframe and populate it
llm_end_to_end_matching_df = MatchingDataframe(
matching_df=pd.DataFrame(columns=list_of_matching_criteria, index=list_of_texts)
)
# Populate the dataframe by matching each text against each criterion
for text in list_of_texts:
for criterion in list_of_matching_criteria:
if llm_end_to_end_method.match_text_with_criterion(text, criterion):
llm_end_to_end_matching_df.matching_df.loc[text, criterion] = 1
else:
llm_end_to_end_matching_df.matching_df.loc[text, criterion] = 0
# Evaluate results
evaluation_of_llm_end_to_end_matching_dataframe = EvaluationOfMatchingDataframe.from_matching_dataframe(
llm_end_to_end_matching_df, reference_matching_df
)
metrics = calculate_metrics_from_evaluated_matching_results_df(
evaluation_of_llm_end_to_end_matching_dataframe.evaluation_of_matching_df
)
String Matching
- Use LLM-based triplet extraction to extract triplets from your texts (same as main method)
- Initialize the string matching method instead of LLM-based triplet matching
- For each text, extract candidate triplets from the
SentencesWithTripletsFromText
object - Apply string similarity (fuzzy matching) to match candidate triplets against target triplets
- Create matching dataframe from the string-based matching results
- Evaluate results against reference matching matrix and calculate metrics
# Use triplets extracted from LLM-based method (or extract new ones)
list_of_sentences_with_triplets_from_texts = [sentences_with_triplets_from_text] # From previous example
# Initialize string matching method
string_matching_method = StringMatchingMethod()
# Process each text
list_of_TripletMatchesForText = []
for sentences_with_triplets_from_text in list_of_sentences_with_triplets_from_texts:
candidate_triplets_for_text = sentences_with_triplets_from_text.get_list_of_triplets()
triplet_matches_from_text = string_matching_method.match_triplets(
target_triplets, # Note: target_triplets comes first
candidate_triplets_for_text,
sentences_with_triplets_from_text.text
)
list_of_TripletMatchesForText.append(triplet_matches_from_text)
# Create dataframe and evaluate
matching_dataframe = MatchingDataframe(matches_list=list_of_TripletMatchesForText)
evaluation_of_matching_dataframe = EvaluationOfMatchingDataframe.from_matching_dataframe(
matching_dataframe, reference_matching_df
)
metrics = calculate_metrics_from_evaluated_matching_results_df(
evaluation_of_matching_dataframe.evaluation_of_matching_df
)
Rule-Based Triplet Extraction
- Initialize the rule-based extraction method using spaCy dependency parsing patterns
- Create a
TripletExtractor
with the rule-based method instead of LLM-based extraction - Extract triplets from each text using predefined linguistic patterns and dependency trees
- Use LLM-based
TripletMatcher
for the triplet matching step (same as main method) - Match extracted triplets against target triplets using the LLM matcher
- Create matching dataframe and evaluate against reference matrix to calculate metrics
# Initialize rule-based extraction method
rule_based_extraction_method = RuleBasedTripletExtractionMethod(RuleSets.patterns_general)
extractor = TripletExtractor(extraction_method=rule_based_extraction_method)
# Extract triplets from texts using rule-based method
list_of_sentences_with_triplets_from_texts = []
for text in list_of_texts:
sentences_with_triplets_from_text = extractor.extract_triplets_from_text(text)
list_of_sentences_with_triplets_from_texts.append(sentences_with_triplets_from_text)
# Use the LLM-based matcher for triplet matching
triplet_matching_method = TripletMatcher(matching_method=llm_model)
list_of_TripletMatchesForText = []
for sentences_with_triplets_from_text in list_of_sentences_with_triplets_from_texts:
triplet_matches_from_text = triplet_matching_method.match_triplets(
target_triplets,
sentences_with_triplets_from_text.get_list_of_triplets(),
sentences_with_triplets_from_text.text
)
list_of_TripletMatchesForText.append(triplet_matches_from_text)
# Evaluate results
matching_dataframe = MatchingDataframe(matches_list=list_of_TripletMatchesForText)
evaluation_of_matching_dataframe = EvaluationOfMatchingDataframe.from_matching_dataframe(
matching_dataframe, reference_matching_df
)
metrics = calculate_metrics_from_evaluated_matching_results_df(
evaluation_of_matching_dataframe.evaluation_of_matching_df
)
input text
: the text to be processedextracted triplets
: the triplets extracted from the input text, and used as candidate triplets for matchingcandidate triplets
: the triplets extracted from the input text, to be matched against the target tripletstarget criterion
: a criterion, ie. a key content of the text that the input text needs to be compared againsttarget triplets
: manually defined triplets that each aim to represent a target criterionmatched triplets
: a combination of a target triplet and a subset of the candidate triplets that match that target tripletpredicted matching matrix
: a matrix of boolean values, where the rows are the input texts and the columns are the target triplets, and the value at the intersection of a row and a column is True if the target triplet is matched to the input text by the matching algorithm, False otherwisereference matching matrix
: a matrix of boolean values, representing a human-annotated mapping of input texts to target triplets, used to evaluate the performance of the matching algorithm
Triplet
: Represents a subject-predicate-object tripletsubject
string: The subject of the tripletpredicate
string: The predicate of the tripletobject
string: The object of the triplet
SentenceWithTriplets
: Represents a sentence with its extracted tripletssentence
string: The sentencetriplets
list[Triplet
]: The list of triplets extracted from the sentence
SentencesWithTripletsFromText
: Represents a text and its sentences with their extracted tripletstext
string: The textsentences
list[SentenceWithTriplets
]: The sentences with their extracted triplets
TripletMatch
: Contains a target triplet and its matching candidate tripletstarget_triplet
Triplet
: The target tripletcandidate_triplets
list[Triplet
]: The list of candidate triplets that match the target triplet
TripletMatchesFromText
: Collection of triplet matches for a given text. Attributes:text
string: The texttriplet_matches
list[TripletMatch
]: The list of triplet matches for the given text
TripletExtractor
: Extracts triplets from text using LLMs.- Important attributes:
extraction_method
: The method used to extract triplets from text, expected to implement theextract_triplets(sentence: str) -> str
method, can beLLM_Model
orRuleBasedTripletExtractionMethod
- Important methods:
extract_triplets_from_text(text: str) -> SentencesWithTripletsFromText
: Extracts triplets from a given textextract_triplets_from_sentences(sentences: list[str]) -> list[SentenceWithTriplets]
: Extracts triplets from a list of sentencesextract_triplets_from_sentence(sentence: str) -> SentenceWithTriplets
: Extracts triplets from a given sentence
- Important attributes:
TripletMatcher
: Matches extracted triplets against target triplets.- Important attributes:
matching_method
: The method used to match triplets, expected to implement thematch_triplets(target_triplet: Triplet, candidate_triplets: list[Triplet]) -> str
method, can beLLM_Model
orStringMatchingMethod
- Important methods:
match_triplets(target_triplets: list[Triplet], candidate_triplets: list[Triplet], text: str) -> TripletMatchesFromText
: Matches a list of target triplets against a list of candidate triplets, for each target triplet, the whole list of candidate triplets is checked against itmatch_target_triplet_with_candidate_triplets(target_triplet: Triplet, candidate_triplets: list[Triplet]) -> TripletMatch
: Matches a target triplet against a list of candidate triplets
- Important attributes:
LLM_Model
: Interface for local vLLM models, OpenAI API, and Together API models.- Important attributes:
model_id
: The ID of the model to usetemperature
,max_tokens
: hyperparameters defined insrc/config.py
extract_triplets_prompt
: The prompt to use for triplet extractionmatch_triplets_prompt
: The prompt to use for triplet matching
- Important methods:
_generate_response(prompt: str) -> str
: Generates a response from the LLM modelextract_triplets(sentence: str) -> str
: Extracts triplets from a given sentence, outputs an LLM response, to theextract_triplets_prompt
defined insrc/config.py
match_triplets(target_triplet: Triplet, candidate_triplets: list[Triplet]) -> str
: Matches a target triplet against a list of candidate triplets, outputs an LLM response, to thematch_triplets_prompt
defined insrc/config.py
- Important attributes:
LLMEndToEndTextMatchingMethod
: Direct criterion matching without triplet extraction. Important methods:match_text_with_criterion(text: str, criterion: str) -> bool
: Matches a given text against a given criterion
StringMatchingMethod
: Simple string similarity-based matching for triplet comparison. Important methods:match_triplets(target_triplets: List[Triplet], candidate_triplets: List[Triplet], text: str) -> TripletMatchesFromText
: Matches target triplets against candidate triplets
RuleBasedTripletExtractionMethod
: Uses spaCy rules to extract triplets. Important methods:extract_triplets(sentence: str) -> str
: Extracts triplets from a given sentence
Detailed list of outputs
Triplet extraction:
output/triplet_extraction/{dataset_name}_{model_id}_texts_with_sentences_and_extracted_triplets.txt
: (one for each triplet extraction method) a txt file containing the triplets extracted from each sentence of each input text, with a given method
Triplet matching:
output/triplet_matching/{dataset_name}_{model_id}_matched_triplets_for_each_attempt.json
: (one for each triplet matching method) a json file containing the list of texts, for each text, the list of target triplets and for each, the list of their matching candidate triplets for a given methodoutput/triplet_matching/{dataset_name}_{model_id}_matching_dataframe.csv
: (one for each triplet matching method) a csv file containing the matching results for the list of target triplets and their matching candidate triplets for each text with a given methodoutput/triplet_matching/text_{text_id}/{dataset_name}_{model_id}_text_{text_id}_matching_list_and_criteria.txt
: (multiple txt files, one per text per method) formatted reports containing a list with each entry a criterion and the target triplet that represents it, the answer to whether it was fulfilled and if it was fulfilled, the list of matching candidate triplets that were found for that criterion
Matching evaluation:
output/matching_evaluation/{dataset_name}_{model_id}_evaluation_of_matching.csv
: (one for each triplet matching method) a csv file containing the matching evaluation, which compares the matching results to the reference matching and determines true positives, false positives, true negatives, false negatives, showing the result in a table, with texts as rows and target triplets as columnsoutput/matching_evaluation/{dataset_name}_{model_id}_metrics.json
: (one for each triplet matching method) a json file containing the performance metrics calculated from the matching evaluation, including precision, recall, f1-score, accuracy, MCCoutput/matching_evaluation/detailed_reports/{dataset_name}_{model_id}_detailed_report.txt
: (one for each triplet matching method) a comprehensive detailed report showing the matching process and evaluation resultsoutput/matching_evaluation/{dataset_name}_all_metrics_from_all_methods.csv
: a csv file containing all metrics for all methods
Plots:
output/plots/{dataset_name}_mcc_scores.png
andoutput/plots/{dataset_name}_mcc_scores.pdf
: horizontal bar charts comparing MCC scores across all methods
Other Methods Outputs:
output/other_methods/llm_end_to_end_method/{dataset_name}_{model_id}_llm_end_to_end_matching.csv
: direct text-criterion matching resultsoutput/other_methods/llm_end_to_end_method/{dataset_name}_{model_id}_evaluation_of_llm_end_to_end_matching.csv
: evaluation of end-to-end matchingoutput/other_methods/llm_end_to_end_method/{dataset_name}_{model_id}_llm_end_to_end_metrics.json
: metrics for end-to-end methodoutput/other_methods/string_matching_method/{dataset_name}_string_matching_output.txt
: detailed string matching logs and scoresoutput/other_methods/string_matching_method/{dataset_name}_metrics_with_{model_id}.json
: metrics for string matching methodoutput/other_methods/rule_based_triplet_extraction_method/{dataset_name}_rule_based_triplet_extraction_texts_with_sentences_and_extracted_triplets.txt
: rule-based extraction resultsoutput/other_methods/rule_based_triplet_extraction_method/{dataset_name}_rule_based_triplet_extraction_metrics_with_{model_id}.json
: metrics for rule-based extraction methodoutput/other_methods/rule_based_triplet_extraction_method/dependency_trees/text_{text_id}/
: dependency parsing trees and graphs for each text
The project uses a configuration system defined in src/config.py
with the following components:
model_id
: ID of the model to usemax_tokens
: Maximum tokens to generate (default: 512)temperature
: Sampling temperature (default: 1.0)extract_triplets_prompt
: Prompt template for triplet extractionmatch_triplets_prompt
: Prompt template for triplet matching
data_dir
: Directory for case files (default: "data")output_dir
: Directory for final outputs (default: "output")default_language
: Default language (default: "de")dataset_name
: Name of the dataset (default: "case_2")
The project supports multiple model providers:
- OpenAI API: Used when API key is set and the model is available in OpenAI's lineup
- Together API: Used when API key is set and the model is available in Together's lineup
- Local vLLM: Used as fallback when API options aren't available
The system automatically selects the appropriate provider based on model and API key availability.
This project is licensed under the MIT License. See the LICENSE file for details.