Structuring Radiology Reports: Challenging LLMs with Lightweight Models

📝 Paper • 🤗 Hugging Face • 🧩 Github • 🪄 Project

[Accepted to EMNLP 2025 Main Conference]

Project Overview

Radiology reports are critical for clinical decision-making but often lack a standard- ized format, limiting both human interpretabil- ity and machine learning (ML) applications. While large language models (LLMs) have shown strong capabilities in reformatting clini- cal text, their high computational requirements, lack of transparency, and data privacy con- cerns hinder practical deployment. To ad- dress these challenges, we explore lightweight encoder-decoder models (<300M parame- ters)—specifically T5 and BERT2BERT—for structuring radiology reports from the MIMIC- CXR and CheXpert Plus datasets. We bench- mark these models against eight open-source LLMs (1B–70B parameters), adapted using prefix prompting, in-context learning (ICL), and low-rank adaptation (LoRA) finetuning. Our best-performing lightweight model out- performs all LLMs adapted using prompt- based techniques on a human-annotated test set. While some LoRA-finetuned LLMs achieve modest gains over the lightweight model on the Findings section (BLEU 6.4%, ROUGE-L 4.8%, BERTScore 3.6%, F1-RadGraph 1.1%, GREEN 3.6%, and F1-SRR-BERT 4.3%), these improvements come at the cost of sub- stantially greater computational resources. For example, LLaMA-3-70B incurred more than 400 times the inference time, cost, and car- bon emissions compared to the lightweight model. These results underscore the poten- tial of lightweight, task-specific models as sus- tainable and privacy-preserving solutions for structuring clinical text in resource-constrained healthcare settings.

Task

Automatically transform free-text chest X-ray radiology reports into a standardized, structured format.

Models

Model	Variant	HuggingFace Link
BERT2BERT	RoBERTa-base	🤗 StanfordAIMI/SRR-BERT2BERT-RoBERTa-base
	RoBERTa-biomed	🤗 StanfordAIMI/SRR-BERT2BERT-RoBERTa-biomed
	RoBERTa-PM-M3	🤗 StanfordAIMI/SRR-BERT2BERT-RoBERTa-PM-M3
	RadBERT	🤗 StanfordAIMI/SRR-BERT2BERT-RadBERT
T5	T5-Base	🤗 StanfordAIMI/SRR-T5-Base
	Flan-T5	🤗 StanfordAIMI/SRR-T5-Flan
	SciFive	🤗 StanfordAIMI/SRR-T5-SciFive

Dataset

Dataset	HuggingFace Link
SRRG-Findings	🤗 StanfordAIMI/srrg_findings

Example Usage

Required Packages

pip install transformers==4.44.0
pip install torch==2.3

import io
import torch
from transformers import EncoderDecoderModel, AutoTokenizer

# Step 1: Setup
model_name = "StanfordAIMI/SRR-BERT2BERT-RoBERTa-base"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Step 2: Load Processor and Model
model = EncoderDecoderModel.from_pretrained(model_name).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True, padding_side="right", use_fast=False)
model.config.decoder_start_token_id = tokenizer.cls_token_id
model.config.bos_token_id = tokenizer.cls_token_id
model.eval()

# Step 3: Inference (example from MIMIC-CXR dataset)
input_text = "CHEST RADIOGRAPH PERFORMED ON ___  ...  Impression: Limited exam with small bilateral effusions, cardiomegaly, and possible mild interstitial edema."
inputs = tokenizer(input_text, padding="max_length", truncation=True, max_length=512, return_tensors="pt")
inputs["attention_mask"] = inputs["input_ids"].ne(tokenizer.pad_token_id)
input_ids = inputs['input_ids'].to(device)
attention_mask = inputs["attention_mask"].to(device)

generated_ids = model.generate(
    input_ids,
    attention_mask=attention_mask,
    max_new_tokens=286,
    min_new_tokens=120,
    decoder_start_token_id=model.config.decoder_start_token_id,
    num_beams=5,
    early_stopping=True,
    max_length=None
)[0]

decoded = tokenizer.decode(generated_ids, skip_special_tokens=True)
print(decoded)

Setup

Installation Steps

Follow these steps to set up the environment and get the project running:

# Step 1: Clone the Repository
git clone https://github.com/johannes2moll/rad-report-structuring.git
# Optional: If submodule doesn't work (StructEval folder doesn't exist in src), clone submodule
cd rad-report-structuring/src
git clone https://github.com/jbdel/StructEval.git

# Step 2: Create Conda Environments 
# To reproduce all results, three different environments are needed (due to version collisions of green_score, radgraph, and transformers.EncoderDecoder)
# srrrun: training and running models: run_llm.sh, run_model.sh, train_llm.sh, train_model.sh
# srreval: evaluate all metrics but GREEN: calc_metrics.sh
# green: evaluate on GREEN metric: calc_metrics.sh (Note that for this you have to activate the import in src/StructEval/structueval/StructEval.py) and change the parameters in src/calc_metrics.py
conda create -n srrrun python=3.10
conda create -n srreval python=3.10
conda create -n green python=3.10.0

# Step 3: Install Requirements
conda activate srrrun
pip install -r requirements_run.txt
conda activate srreval 
pip install -e src/StructEval
pip install -r requirements_eval.txt
conda activate green
pip install -r requirements_green.txt

# Step 4: Prepare the Data and set HOME directory
# Set DIR and DIR_MODELS_TUNED in src/constants.py

# Step 5: Train a Model
conda activate srrrun
bash train_model.sh
bash train_llm.sh

# Step 7: Generate Prediction on Test Set
conda activate srrrun
bash run_model.sh
bash run_llm.sh

# Step 8: Evaluate
conda activate srreval
bash calc_metrics.sh

✏️ Citation

If you find this work useful, please cite:

@article{structuring-2025,
  title={Structuring Radiology Reports: Challenging LLMs with Lightweight Models},
  author={Moll, Johannes and Fay, Louisa and Azhar, Asfandyar and Ostmeier, Sophie and Lueth, Tim and Gatidis, Sergios and Langlotz, Curtis and Delbrouck, Jean-Benoit},
  journal={arXiv preprint arXiv:2506.00200},
  url={https://arxiv.org/abs/2506.00200},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Structuring Radiology Reports: Challenging LLMs with Lightweight Models

[Accepted to EMNLP 2025 Main Conference]

Project Overview

Task

Models

Dataset

Example Usage

Setup

Installation Steps

✏️ Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
src		src
.gitignore		.gitignore
README.md		README.md
calc_metrics.sh		calc_metrics.sh
requirements_eval.txt		requirements_eval.txt
requirements_green.txt		requirements_green.txt
requirements_run.txt		requirements_run.txt
run_llm.sh		run_llm.sh
run_model.sh		run_model.sh
train_llm.sh		train_llm.sh
train_model.sh		train_model.sh

jomoll/rad-report-structuring

Folders and files

Latest commit

History

Repository files navigation

Structuring Radiology Reports: Challenging LLMs with Lightweight Models

[Accepted to EMNLP 2025 Main Conference]

Project Overview

Task

Models

Dataset

Example Usage

Setup

Installation Steps

✏️ Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages