Our project, Legal HI (Legal Harmonized Intelligence), is a multi-agent system that leverages an open-source LLM to generate concise summaries of court judgments.
Set your Llama API key (required) as environment variable, below is one way of doing that:
export LLAMA_API_KEY=<your api key>
Run the Legal HI to generate the summary.
python legal_hi.py <your input document as .txt file>
You can use the example input here: prompt_experiments/example_input.txt
The baseline and Legal HI's evaluation results are located at eval_data/
. It includes the summaries generated by the models as well as the human-written ones on Case in Brief.
You can reproduce our results in the report by running:
python avg_scores.py eval_data\baseline_eval_results.json
python avg_scores.py eval_data\legalhi_eval_results.json
-
avg_scores.py
: Calculate the average BERT and ROUGE scores of evaluation results. -
baseline.py
: Generate and evaluate summaries by our single-agent baseline model. -
evaluation.py
: Evaluate the summaries generated by Legal HI -
generate_response.py
: Generate summaries with Legal HI for evaluation. -
legal_hi.py
: The command line user interface of Legal HI to generate case summaries from input documents. -
architecture/model.py
: The implementation of Legal HI. -
data/
: All the data we collected and generated. -
eval_data
: Data we used for performance evaluation. -
prompt_experiments/
: Various prompt experiments conducted during the development of Legal HI, including using different models (e.g. Llama-3.1-8b), single-agent architecture, and system prompts. -
score/
: The implementation of ROUGE and BERT scores for evaluation. -
webcrawler/
: The web crawler we used to scrape data from the internet.