TraceFL: Interpretability-Driven Debugging in Federated Learning via Neuron Provenance

Paper: arXiv Preprint
Artifact Archive: Figshare Permanent Record
Authors: Waris Gill, Ali Anwar, Muhammad Ali Gulzar

Note

TraceFL is accepted at ICSE 2025. TraceFL artifact is fully functional and runs in less than 15 minutes on Google Colab with a single click. Read the full README for more details.

1. Purpose

TraceFL is the first interpretability technique that enables interpretability in Federated Learning (FL) by identifying clients responsible for specific global model predictions. By making such provenance information explicit, developers can exclude problematic clients, reward high-quality clients, or debug misclassifications more systematically.

This artifact provides:

Complete implementation of the TraceFL framework
Pre-configured experiments replicating all paper results
Cross-domain support for image/text classification models (e.g., GPT )
One-click reproducibility on Google Colab.

Badges Claimed:

Artifacts Available: All our code and documentation are publicly and permanently archived (Figshare DOI).
Artifacts Functional: We include step-by-step installation scripts, test commands, and evidence of correct behavior in a minimal environment.
Artifacts Reusable: We offer detailed documentation, consistent structure, modular design, a permissive license, and instructions for extending the framework to new models/datasets.

2. Provenance

Paper Preprint: ICSE 2025 Camera-Ready
Archived Artifact: The exact version of this repository (including code, configurations, and instructions) is archived at Figshare DOI.
GitHub Repository (development version): GitHub - SEED-VT/TraceFL (non-archival).
License:

3. Data

TraceFL is a domain-agnostic framework carefully designed to handle various data modalities (vision, text, and medical). We demonstrate its flexibility by evaluating multiple image, medical imaging, and text datasets, as well as different neural architectures ranging from classic CNNs to transformers.

Datasets

Image Classification: CIFAR-10, MNIST (public benchmarks with 10 classes each).
Medical Imaging: Colon PathMNIST and Abdominal OrganAMNIST from MedMNIST. These datasets are curated, de-identified, and suitable for research in FL-based medical imaging.
Text Classification: DBpedia, Yahoo Answers (both standard benchmarks in natural language processing).

All datasets are publicly available. We follow FlowerDatasets guidelines to download, partition, and integrate these datasets with minimal configuration overhead.

Models

Image Classification Models:
- ResNet (e.g., resnet18, resnet50)
- DenseNet (e.g., densenet121)
Medical Imaging: Same CNN-based architectures (ResNet, DenseNet) easily adapted for grayscale inputs or domain-specific classification tasks.
Text Classification Models:
- GPT
- BERT

TraceFL uses a consistent interpretability mechanism at the neuron level, which naturally extends to different layers and architectures. This ensures minimal or no code changes are needed to debug new classification models—so long as they output logits for classification.

4. Setup

System Requirements:

Note

Original Paper Hardware Setup: To resemble real-world FL and do large scale simulations, we deploy our experiments in Flower FL framework, running on an enterprise-level cluster of six NVIDIA DGX A100 nodes. Each node is equipped with 2048 GB of memory, at least 128 cores, and an A100 GPU with 80 GB of memory.

Artifact Hardware Setup: We change the default configuration in tracefl/conf/base.yaml to run representative experiments on Google Colab even with only 2 cpu cores, 12 GB of System RAM and 15 GB of GPU RAM.

We provide two approaches to setting up the environment:

4.1 Quick Colab Setup

To quickly validate and and produce the artifact, click: .

This will open a Google Colab notebook with all dependencies pre-installed. You can run the provided demo script to verify the installation and generate a sample provenance report.

Expected Outcome ✅:

...
[INFO] - ----------------------------
[INFO] - Total Rounds: 2
[INFO] - TraceFL Average Localization Accuracy: 100.0

4.2 Local/Conda Setup

Create Conda environment (Python 3.10):

conda create --name tracefl python=3.10 -y
conda activate tracefl

Install Poetry:
```
pip install poetry
```

Clone and install dependencies:

git clone https://github.com/SEED-VT/TraceFL
cd TraceFL
poetry install

Expected Output ✅:

     ...
     - Installing ruff (0.0.272)
     - Installing transformers (4.48.1)
     - Installing types-requests (2.27.7)
 
 Installing the current project: tracefl (1.0.0)

5. Usage

5.1 Fully Functional TraceFL Artifact Command

We configure TraceFL using a YAML file (tracefl/conf/base.yaml) or command-line arguments. Any parameter in the YAML file can be overridden by passing a flag to the python -m tracefl.main command (e.g., dirichlet_alpha=0.1). By only using this command and the YAML file, you can run all experiments in the paper. However, we provide additional scripts for specific figures/tables for convenience.

# Run TraceFL with default configuration (base.yaml) or override. 
python -m tracefl.main dirichlet_alpha=0.1

Expected Outcome:

The script trains a small FL setup with 2 clients on MNIST for 1 round.
If successful, the console logs will indicate completion with a message like:

...
[INFO] -             *********** Input Label: 8, Responsible Client(s): c1  *************
[INFO] -      Traced Client: c1 || Tracing = Correct
[INFO] -     TraceFL Clients Contributions Rank:     {'c1': 0.98, 'c16': 0.01, 'c19': 0.01, 'c9': 0.01}
[INFO] - 

[INFO] - [Round 3] TraceFL Localization Accuracy = 100.0 || Total Inputs Used In Prov: 6 || GM_(loss, acc) (3.24006986618042,0.18896484375)
[INFO] - ----------------------------
[INFO] - Total Rounds: 3
[INFO] - TraceFL Average Localization Accuracy: 100.0

This will also generate a provenance report TraceFL_clients_contributions.log in the current directory. Inspect the file for detailed neuron contributions and responsible clients for each input label.

5.2 Reproducing Main Paper Experiments

Note

These scripts are also executable in Colab .
Simply open the Colab notebook, uncomment the relevant evaluation cell, and run.
They are configured with minimal resource requirements for standard hardware.
To run large-scale experiments (as in the paper, on a DGX A100 cluster), adjust your hardware settings and increase the number of clients/rounds in the config.

Although any configuration can be run via the command-line, e.g.,:

python -m tracefl.main dirichlet_alpha=0.1

we provide scripts that reproduce each figure/table result from the paper. All generated logs are stored in TraceFL_clients_contributions.log.

TraceFL’s Localization Accuracy in Correct Predictions (Figure-2), TraceFL’s Scalability (Table-3 and Figure-5)
```
bash scripts/a_figure_2_table_3_and_figure_5.sh
```
Varying Data Distribution Figure-3
```
bash scripts/b_figure_3.sh
```
TraceFL’s Localization Accuracy in Mispredictions (Table-1) and (Figure-6)
```
bash scripts/c_table_1_and_figure_6.sh
```
Differential Privacy (DP) Enabled FL (Figure-4 and Table-2)
```
bash scripts/d_figure_4_and_table_2.sh
```

5.3 Beyond Replication: Customizing TraceFL

TraceFL is easily extensible to new datasets, models, and configurations. For example, the Hugging Face library offers approximately 80k text classification models, 15k image classification models, 5k text classification datasets, and 1k image classification datasets. TraceFL can adapt to these with minimal changes (10–15 lines of code).

To demonstrate, we integrate the distilbert/distilbert-base-uncased model which is not included in the paper and the dbpedia_14 dataset,

python -m tracefl.main dataset.name=dbpedia_14  model.name=distilbert/distilbert-base-uncased device=cuda

6 License

This artifact is released under , enabling:

Commercial use
Modification
Distribution
Private use

7. How This Artifact Meets ICSE Criteria

Available
- Permanently hosted on Figshare (Figshare DOI) and supplemented on GitHub.
Functional
- Documented installation procedures.
- Includes a one-click Google Colab setup for quick validation.
- Reproduces major results from the paper via the provided scripts.
Reusable
- Carefully organized code (modular architecture, YAML configuration).
- Extension for new datasets or neural architectures.
- A permissive, open-source license ensures freedom to reuse.
- A fully functional minimal Google Colab setup (better than Docker) and a fully functional local setup capable of running on industrial-scale HPC clusters for real-world FL simulations.

8. Contact and Support

For any installation or usage issues, please open a GitHub Issue at TraceFL Issues.
For questions related to the paper or advanced usage, contact the author directly via (waris@vt.edu)[waris@vt.edu].

Award Considerations

First-of-its-kind: TraceFL is the first FL interpretability framework to identify clients responsible for global model predictions and addresses the open challenge of interpretability in FL.
Cross-Domain Validation: Works with 4 model architectures across 6 datasets including real-world medical imaging datasets including modern architectures like GPT.
Scalability: From Colab-free tier to multi-GPU clusters.
Reproducibility: Each evaluation in a figure or table in the paper can be replicated with a single script.
Impact: First FL interpretability framework supporting both CV/NLP.
Innovation: Implements novel neuron provenance tracking methodology.

Citation

If you use TraceFL in your research, please cite our paper:

@inproceedings{gill2025tracefl,
  title = {{TraceFL: Interpretability-Driven Debugging in Federated Learning via Neuron Provenance}},
  author = {Gill, Waris and Anwar, Ali and Gulzar, Muhammad Ali},
  booktitle = {2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE)},
  year = {2025},
  organization = {IEEE},
}

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
graphs		graphs
scripts		scripts
tracefl		tracefl
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TraceFL-Paper.pdf		TraceFL-Paper.pdf
TraceFL_clients_contributions.log		TraceFL_clients_contributions.log
artifact.ipynb		artifact.ipynb
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
reproducing.ipynb		reproducing.ipynb
tracefl.png		tracefl.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TraceFL: Interpretability-Driven Debugging in Federated Learning via Neuron Provenance

1. Purpose

2. Provenance

3. Data

Datasets

Models

4. Setup

4.1 Quick Colab Setup

4.2 Local/Conda Setup

5. Usage

5.1 Fully Functional TraceFL Artifact Command

5.2 Reproducing Main Paper Experiments

5.3 Beyond Replication: Customizing TraceFL

6 License

7. How This Artifact Meets ICSE Criteria

8. Contact and Support

Award Considerations

Citation

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

SEED-VT/TraceFL

Folders and files

Latest commit

History

Repository files navigation

TraceFL: Interpretability-Driven Debugging in Federated Learning via Neuron Provenance

1. Purpose

2. Provenance

3. Data

Datasets

Models

4. Setup

4.1 Quick Colab Setup

4.2 Local/Conda Setup

5. Usage

5.1 Fully Functional TraceFL Artifact Command

5.2 Reproducing Main Paper Experiments

5.3 Beyond Replication: Customizing TraceFL

6 License

7. How This Artifact Meets ICSE Criteria

8. Contact and Support

Award Considerations

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages