🌟 MediaEval Medico 2025: VQA (with multimodal explanations) for GastroIntestinal Imaging 🌟

📋 GitHub Repository | 🔗 MediaEval 2025 | 📝 Registration Form | 🏆 Leaderboard / Registered Submissions

The MediaEval Medico 2025 Challenge 🔬 focuses on Visual Question Answering (VQA) for Gastrointestinal (GI) imaging, emphasizing explainability 🤔📖 to foster trustworthy AI for clinical adoption ⚕️.

This task continues the long-running Medico series at MediaEval, now leveraging the newly developed Kvasir-VQA-x1 dataset, designed to support multimodal reasoning and interpretable clinical decision support 📈.

📅 Save the Date!

The annual MediaEval Workshop 🗣️ will be held on: 🗓️ Saturday–Sunday, 25–26 October 2025 | 📍 Dublin, Ireland 🇮🇪 & Online 🌐(between CMBI 2025 and ACM Multimedia 2025). Participants are invited to join the workshop and present their work submitted to the competition. 🙌🎤.

🌟 Task Descriptions

🔍 Subtask 1: AI Performance on Medical Image Question Answering

📈 Goal: Develop AI models that can accurately answer clinical questions using GI endoscopic images.

🧠 The task uses Kvasir-VQA-x1, an advanced dataset comprising 159,549 QA pairs from 6,500 original GI images, featuring:

Multi-step reasoning questions
Naturalized medical language
Complexity scores for curriculum training

🔠 Question Types include:

Yes/No
Single-Choice
Multiple-Choice
Color-related
Location-related
Numerical Count
Merged reasoning-based questions

💡 Example Training Notebook:
Not sure where to start? Check out: Training with ms-swift

⚠️ Note: You can only submit work for Task 1 if you wish to participate.

💬 Subtask 2: Clinician-Oriented Multimodal Explanations in GI

📌 Goal: Move beyond simply predicting an answer (Subtask 1) and generate rich, multimodal explanations that are transparent, understandable, and trustworthy for clinicians.

Your system should justify its predictions using multiple complementary reasoning forms—e.g., combining a detailed textual clinical explanation with a visual localization and/or a confidence measure.

Requirements:

Faithful to the model’s reasoning.
Clinically relevant and medically sound.
Useful for real-world decision-making.

📄 Submission Format

A JSON/JSONL file where each entry corresponds to one test case:

{
  "img_id": "UNIQUE_IMAGE_IDENTIFIER",
  "question": "Original question posed to the model.",
  "answer": "Model's prediction from Subtask 1.",
  "textual_explanation": "Detailed narrative in clinical language justifying the answer.",
  "visual_explanation": {
    "type": "heatmap | segmentation_mask | bounding_box | etc.",
    "format": "path/to/visual.png | base64_string | [[x1,y1,x2,y2]]",
    "description": "Brief description of what the visual shows."
  },
  "confidence_score": 0.92
}

Field-by-Field Requirements:

img_id / question / answer → Must match Subtask 1 data and predictions exactly.
textual_explanation (Mandatory) → Clinician-oriented reasoning referencing visual cues (location, morphology, color, size, vascular pattern, etc.).
visual_explanation (Optional but encouraged) → Heatmaps, segmentation masks, or bounding boxes linked to the textual explanation.
confidence_score (Optional but encouraged) → Float in [0, 1], from model confidence or uncertainty estimation.

💡 Suggested Approaches

VLM Self-Probing for Explanations — Ask auxiliary questions (e.g., "What is the abnormality?", "Where is it located?", "Describe its morphology") and combine answers into the textual_explanation.
Visual Grounding — Generate heatmaps or attention maps showing influential regions and link them to textual descriptions.
Segmentation / Detection — Produce masks or bounding boxes highlighting relevant pathology, reinforcing clinician trust.

⚠️ Participation in Subtask 2 requires completion of Subtask 1.

📂 Dataset Overview: Kvasir-VQA-x1

Built on HyperKvasir and Kvasir-Instrument, the Kvasir-VQA-x1 dataset includes:

🧬 159,549 QA pairs
🖼️ 6,500 original GI images
♻️ 10 weakly augmented images per original (augmentation script provided)
🧠 Complexity levels 1–3
🧪 Realistic medical question reformulations using LLMs

📥 Dataset: Kvasir-VQA-x1 @ SimulaMet on Hugging Face

🔍 Evaluation Methodology

Subtask 1 (VQA Performance)

Metrics: BLEU, ROUGE (1/2/L), METEOR
Settings: Original & augmented images
Criteria: Accuracy, relevance, medical correctness

Subtask 2 (Explainability)
Rated by experts on:

Answer correctness
Clarity & clinical relevance
Visual alignment
Confidence calibration
Methodology & novelty

🏆 Submission System

🚧 Work in Progress — We are currently preparing the submission system to ensure it is fully functional when submissions begin. Please do not hesitate to contact us if you encounter any issues.

📌 View Registered Submissions

We use the medvqa Python package to validate and submit models to the official system.

📦 Install

pip install -U medvqa

Always use the latest version.

The model that needs to be submiited is expected to be in a HuggingFace repository. Your HuggingFace repo must include a standalone script named:

Use the provided template script, and make sure to:

Modify all TODO sections
Add required information directly in the script

✅ Validate Before Submitting

First make sure your submission script works fine in your working environment and it loads the model correctly from your submission repo and generates outputs in the required format.

python submission_task1.py

Next, you can validate the script to work independently. The .py script should now be in the root of the same HuggingFace repo as your model. You can try this in a new venv:

medvqa validate --competition=medico-2025 --task=1/2 --repo_id=<your_repo_id>

--competition: Set to medico-2025
--task: Use 1 for Task 1 or 2 for Task 2
--repo_id: Your HuggingFace model repo ID (e.g., SushantGautam/XXModelCheckpoint)

📄 Additional Dependencies

If your code requires extra packages, you must include a requirements.txt in the root of the repo. The system will install these automatically during validation/submission. Else you will get package missing errors.

🚀 Submit

If validation is okey, you can just run:

medvqa validate_and_submit --competition=medico-2025 --task=1/2 --repo_id=<your_repo_id>

This will make a submisision and your username, along with the task and time, should be visible on the leaderboard for it to be considered officially submitted. The submission library will make your Hugging Face repository public but gated, granting the organizers access to your repo. It must remain unchanged at least until the results of the competition are announced. However, you are free to make your model fully public (non-gated). If you encounter any issues with submission, don’t hesitate to contact us.

🛠️ Tools & Resources

Scripts for augmentation, splits, and baselines
Submission templates
Fine-tuned model configs
Attention & saliency visualization methods

📅 Timeline (Preliminary)

📝 April 2025 — Registration for task participation opens ✅
📦 May 2025 — Development data release ✅
🧪 June 2025 — Test data release ✅
📄 24 September 2025 (Wed.) — Runs due
📝 8 October 2025 (Wed.) — Working Notes deadline
🏫 25–26 October 2025 (Sat.–Sun.) — MediaEval Workshop (Dublin + Online)

💼 Organizers

👨‍🔬 Steven A. Hicks — steven@simula.no
🧑‍💻 Michael A. Riegler — michael@simula.no
🧑‍🔬 Vajira Thambawita — vajira@simula.no
👨‍🏫 Pål Halvorsen — paalh@simula.no
🧑‍🎓 Sushant Gautam — sushant@simula.no

🔗 Join Us

Let’s build the future of trustworthy, explainable medical AI.
🌟 GI diagnostics needs interpretable answers. Your model can help save lives.

📍 Register: MediaEval 2025
📁 Repo: GitHub

🚀 Develop explainable AI. Help doctors. Improve lives.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
README.md		README.md
Task_1_Sample_Notebook.ipynb		Task_1_Sample_Notebook.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌟 MediaEval Medico 2025: VQA (with multimodal explanations) for GastroIntestinal Imaging 🌟

📅 Save the Date!

🌟 Task Descriptions

🔍 Subtask 1: AI Performance on Medical Image Question Answering

💬 Subtask 2: Clinician-Oriented Multimodal Explanations in GI

📄 Submission Format

💡 Suggested Approaches

📂 Dataset Overview: Kvasir-VQA-x1

🔍 Evaluation Methodology

🏆 Submission System

📦 Install

✅ Validate Before Submitting

📄 Additional Dependencies

🚀 Submit

🛠️ Tools & Resources

📅 Timeline (Preliminary)

💼 Organizers

🔗 Join Us

About

Uh oh!

Languages

simula/MediaEval-Medico-2025

Folders and files

Latest commit

History

Repository files navigation

🌟 MediaEval Medico 2025: VQA (with multimodal explanations) for GastroIntestinal Imaging 🌟

📅 Save the Date!

🌟 Task Descriptions

🔍 Subtask 1: AI Performance on Medical Image Question Answering

💬 Subtask 2: Clinician-Oriented Multimodal Explanations in GI

📄 Submission Format

💡 Suggested Approaches

📂 Dataset Overview: Kvasir-VQA-x1

🔍 Evaluation Methodology

🏆 Submission System

📦 Install

✅ Validate Before Submitting

📄 Additional Dependencies

🚀 Submit

🛠️ Tools & Resources

📅 Timeline (Preliminary)

💼 Organizers

🔗 Join Us

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages