An ultrasound image encoder pretrained in a self-supervised, multi-task setting, combining masked image reconstruction (MIR), patient matching (PM), and image ordering (IO) to learn meaningful representations without manual labels.
This project implements a ResNet-based encoder trained on ultrasound frames using three self-supervised pretext tasks:
- MIR: Reconstruct masked regions of an image (VAE-style)
- PM: Predict whether two images are from the same video of a patient
- IO: Predict the correct temporal order of shuffled frame sequences
ultrasound-image-encoder/
│
├── dataset.py # Dataset class and masking logic
├── model.py # Multi-head encoder model
├── training.py # Multi-task training loop
├── utils.py # Loss functions
├── main.py # Entry point for training
└── README.md
Install required packages:
pip install torch torchvision pandas numpy pillow tqdm
Ensure that you have a .csv
file containing all the following columns:
frame_path
: path to the image filepatient_id
: patient identifiervideo_name
: unique video/sequence IDframe_id
: frame identifier
Edit config.py
to specify your paths and parameters:
data_path = 'path/to/your/data' # CSV file as described above
checkpoint_path = 'path/to/save/checkpoints' # Directory for model outputs
Execute the training script with:
python main.py
Task | Description | Loss Function | Head Architecture |
---|---|---|---|
MIR | Masked image reconstruction | MSE + KL divergence | Variational autoencoder |
PM | Patient matching from image pairs | Binary cross-entropy | 2-layer MLP on pair features |
IO | Frame sequence ordering | Cross-entropy | 1-layer MLP on concatenated features |
Learns spatial features by reconstructing masked image regions
Develops patient-specific representations through image pairing
Captures temporal dynamics via sequence ordering
This project is licensed under the MIT License © 2025 Ippokratis.