Multi-task Self-Supervised Methods for Label-Efficient Learning

This repository contains the codebase for my Master's thesis on multi-task self-supervised learning (SSL) methods for label-efficient learning. The goal is to leverage self-supervised learning combined with multi-task learning techniques to improve the efficiency and effectiveness of learning from limited labeled data.

🎯 Quick summary

Combine contrastive objectives and pretext tasks in a modular multitask SSL framework.
Dynamic task weighting to balance heterogeneous loss scales.
Precomputed augmentations and masked inputs for fast multi-task training.
Centralized and federated training utilities and checkpoints for privacy-aware workflows.

Figure 1: Overview of the modular multi-task self-supervised learning (SSL) framework combining contrastive and pretext objectives with dynamic task weighting.

📁 Project layout

common/ — shared modules (encoders, losses, model definitions, utilities)
data/ — dataset storage & download helpers (UCI HAR, STL-10)
data_processing/ — augmentations, precomputation and dataloaders
HAR_centralized/ — centralized experiments on HAR, configs and training entrypoints
HAR_federated/ — federated prototype to train encoders (server / client orchestration)
IR_centralized/ — preliminary image-based experiments (STL-10)
packaging & meta: setup.py, requirements.txt, LICENSE

🚀 Quick start

1. Setup environment (tested on Python 3.9+):

# language: sh
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

2. Download the data:

UCI HAR: run data/download_uci_har.sh to download the HAR dataset.
(Optional) STL-10: use data/download_stl10.py to download the Image Recognition dataset (not needed, in case missing, data will be downloaded automatically on the fly).

3. Configure experiment:

Each experiment has a custom config file that can be edited or parameters can be passed as command line flags when starting the training script. For example:

python HAR_centralized/train_encoder.py --seed 42 --experiment_name my_experiment_name

To reproduce the experiments, there is no need to set any custom parameters, as the default values in the config files will be used. The experiment_name can be used to differentiate between different runs, and, together with the seed and information on the model heads active, it will be used for saving models and results.

4. Run the experiments:

HAR experiments will produce models with pre-trained encoders:

centralized training: python HAR_centralized/train_encoder.py (entrypoint: HAR_centralized/train_encoder.py)
federated training: python HAR_federated/run.py (entrypoint: HAR_federated/run.py)

These encoders can be used for downstream tasks on the actual HAR task: python HAR_centralized/train_classifier.py

IR experiments work in the same way:

pretraining: python IR_centralized/train_encoder.py (entrypoint: IR_centralized/train_encoder.py)
downstream training: python IR_centralized/train_classifier.py (entrypoint: IR_centralized/train_classifier.py)

🧭 Tips & configuration

Toggle pretraining heads with MODEL_HEADS in HAR_centralized/config.py or by passing command line arguments (order: augmentation classification, reconstruction, contrastive, features).
Enable dynamic weighting via USE_WEIGHTED_LOSS in config; weights are created/managed in the training entrypoint.
Check OUTPUT_DIR in HAR_centralized/config.py for where logs and checkpoints are written (default: HAR_centralized/outputs/).

📈 Results

1. Multi-Task Learning outperforms Single-Task Self-Supervised Learning

Multi-Task Learning (MTL) significantly outperforms Single-Task Learning methods in self-supervised settings across all labeled-data regimes.

2. Multi-Task Self-Supervised Learning produces compact and robust representations, even better than those obtained with expert knowledge

The t-SNE visualizations of different data representations (each with a different size): raw signals (1,161), expert features (561), and MTL embeddings (128).

3. Federated Learning with Multi-Task Self-Supervised Learning is possible

When trained in a federated setting, Multi-Task Self-Supervised Learning can still achieve competitive performance compared to centralized training. This allows for the utilization of massive decentralized data sources to create robust models while preserving data privacy and security.

📄 Thesis documents

Thesis manuscript (PDF): docs/thesis/thesis.pdf
Presentation slides (PDF): docs/thesis/presentation.pdf

🧾 License & citation

This project is licensed under the MIT License - see the LICENSE file for details. Please cite the following if you use this code in your research:

@mastersthesis{gobbetti:2025:MTSSL,
  title = {Multi-Task Self-Supervised Methods for Label-Efficient Learning},
  author = {Gobbetti, Alessandro},
  school = {Università della Svizzera italiana},
  year = {2025},
  type = {Master's Thesis},
  address = {Lugano, Switzerland}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi-task Self-Supervised Methods for Label-Efficient Learning

🎯 Quick summary

📁 Project layout

🚀 Quick start

1. Setup environment (tested on Python 3.9+):

2. Download the data:

3. Configure experiment:

4. Run the experiments:

🧭 Tips & configuration

📈 Results

1. Multi-Task Learning outperforms Single-Task Self-Supervised Learning

2. Multi-Task Self-Supervised Learning produces compact and robust representations, even better than those obtained with expert knowledge

3. Federated Learning with Multi-Task Self-Supervised Learning is possible

📄 Thesis documents

🧾 License & citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
HAR_centralized		HAR_centralized
HAR_federated		HAR_federated
IR_centralized		IR_centralized
common		common
data		data
data_processing		data_processing
docs/thesis		docs/thesis
resources		resources
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

License

Alessandro-Gobbetti/MTSSL_for_Label-Efficient_Learning

Folders and files

Latest commit

History

Repository files navigation

Multi-task Self-Supervised Methods for Label-Efficient Learning

🎯 Quick summary

📁 Project layout

🚀 Quick start

1. Setup environment (tested on Python 3.9+):

2. Download the data:

3. Configure experiment:

4. Run the experiments:

🧭 Tips & configuration

📈 Results

1. Multi-Task Learning outperforms Single-Task Self-Supervised Learning

2. Multi-Task Self-Supervised Learning produces compact and robust representations, even better than those obtained with expert knowledge

3. Federated Learning with Multi-Task Self-Supervised Learning is possible

📄 Thesis documents

🧾 License & citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages