Skip to content

charangajjala/MedGen_AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MedGen_AI: Multimodal Medical Report Generation πŸ₯

Generating diagnostic reports from chest X-rays using vision transformers and large language models

PyTorch HuggingFace BitsAndBytes QLoRA CheXpert


UH Newsletter Recognition

https://uh.edu/nsm/computer-science/news-events/stories/2024/0814-summer-showcase.php

🩻 Project Overview

MedGen_AI is an advanced multimodal AI system that generates detailed diagnostic reports from chest X-ray images using a fusion of computer vision and large language models. It integrates image understanding with language generation for clinical applications.


🎯 Problem Statement

  • Accurate interpretation of complex radiographic features
  • Generation of clinically relevant, multi-sentence reports
  • Seamless fusion of visual and textual inputs
  • Efficient inference for integration in clinical workflows

🧠 Model Architecture

MedGen_AI Architecture

πŸ” Core Components

  • Vision Encoder: DenseNet-121 with attention mechanism
  • Language Model: LLaMA-2-7B fine-tuned using QLoRA
  • Fusion Module: PreCarDiv for combining visual and textual embeddings

🧱 Visual Pipeline

  • Input: X-ray image (224Γ—224 RGB)
  • DenseNet-121 outputs: (batch_size, 1024, 7, 7)
  • Attention maps for 14 disease categories over 49 patches
  • Output: (batch_size, 14, 1024) disease-specific features

πŸ—£οΈ Language Model

  • LLaMA-2-7B with QLoRA (rank=8, alpha=8, dropout=0.1)
  • 4-bit quantized for memory-efficient training
  • Fine-tuned only attention projections (q_proj, k_proj, etc.)

πŸ”„ Multimodal Fusion (PreCarDiv)

  • Project vision features to match LLM hidden dim
  • Concatenate visual embeddings with token embeddings
  • Use causal language modeling for report generation

πŸ‹οΈ Training

  • Custom loss masking prompt & visual tokens
  • Mixed precision + gradient checkpointing
  • BLEU-based evaluation
  • Batch size: 1 per GPU
  • Learning rate: 5e-4
  • Epochs: 20 with early stopping

πŸ“Š Dataset

  • Name: CheXpert+ (100 samples)
  • Split: 60/20/20 train-val-test
  • Features: X-rays, diagnostic prompts, ground truth reports
  • Labels: 14 disease categories

πŸ“¦ Installation & Setup

βœ… Requirements

  • Python 3.8+
  • CUDA-compatible GPU
  • 16GB+ RAM recommended

πŸ”§ Install Dependencies

git clone https://github.com/charangajjala/MedGen_AI.git
cd MedGen_AI
pip install -r requirements.txt

🧠 Set Up Hugging Face Access


πŸš€ Usage

πŸ“ˆ Training

jupyter notebook multimodal_final.ipynb
  • Modify configs in Cell 3
  • Adjust paths, batch size, epochs
  • Run cells to train and validate

πŸ” Inference

  • Load trained model
  • Input new X-ray image + prompt
  • Generate report using beam search (5 beams)

πŸ§ͺ Evaluation

  • Metric: BLEU score on test reports
  • Parameters Tuned: QLoRA rank, alpha, dropout
  • Generated Output: Full radiology reports

βš™οΈ Configuration

  • LoRA rank: 8
  • Dropout: 0.1
  • Quantization: 4-bit (nf4), float16 compute
  • Loss: Masked token-wise cross-entropy

πŸ“š Documentation

  • PreCarDiv_Dataset.py: Dataset + Tokenizer + Image Preprocessing
  • Model_PreCarDiv.py: Fusion model definition
  • CustomTrainer.py: Handles custom loss and evaluation
  • Precardiv_Poster-G0010.pdf: Research poster
  • Precardiv ppt-G0010.pdf: Final presentation

πŸͺͺ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

  • Hugging Face for LLaMA and Transformers
  • Meta AI for LLaMA-2
  • Stanford ML Group for CheXpert dataset
  • PyTorch & bitsandbytes for optimization tools

About

🫁 Generating diagnostic reports from chest X-rays using vision transformers and large language models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published