Skip to content

This work is a journal paper published at The Journal of Supercomputing in 2023, focusing on improving communication efficiency of a distributed learning system using age-based worker selection techniques.

Notifications You must be signed in to change notification settings

fzhu0628/AgeSel---Communication-Efficient-Local-SGD-with-Age-Based-Worker-Selection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

AgeSel: Communication-Efficient Local SGD with Age-Based Worker Selection

Overview

  • This is my first journal paper titled Communication-Efficient Local SGD with Age-Based Worker Selection.
  • The paper was published in The Journal of Supercomputing in 2023.
  • About this paper:
    • The goal is to enhance the communication efficiency of distributed learning systems under intermittent communication between the server and workers.
    • The study considers a distributed setup with:
      • Partial participation of workers.
      • Heterogeneous local datasets (datasets vary in distribution and size across workers).
    • A simple yet effective age-based method, AgeSel, is proposed. It leverages the "ages" of workers to balance their participation frequencies.
    • The paper establishes rigorous convergence guarantees for AgeSel and demonstrates its effectiveness through numerical results.

Key Ideas of AgeSel

Key Parameters

  • The system consists of $M$ workers. Each worker $m$ has an age parameter $\tau_m$ maintained by the server, representing the number of consecutive rounds the worker has not communicated with the server.
  • A threshold $\tau_{max}$ is predefined to identify workers with low participation frequency.
  • In each round, the server selects $S$ workers to perform local computations.

Worker Selection

  1. The server first selects all workers with $\tau_m \geq \tau_{max}$ in age-descending order, ensuring that low-frequency workers are included.
  2. If fewer than $S$ workers are selected:
    • The remaining workers are chosen with probabilities proportional to the sizes of their datasets.
  3. The selection process stops when exactly $S$ workers are chosen.

Age Update

  • The server broadcasts the global model to the selected workers for local computations.
  • After the workers complete their tasks, the server updates the age parameters.

Results

Convergence Analysis

  • Assuming smoothness, lower boundedness of the objective function, unbiased gradients, and bounded variance, we derive an upper bound of order: $${O}\left(\frac{1}{\eta UJ} + \frac{\eta}{SU} + \frac{1}{\eta U}\right)$$ for the average expected squared gradient norm with nonconvex objectives, where:
    • $J$ is the total number of communication rounds.
    • $U$ is the number of local steps per round.
    • $\eta$ is the local step size.
    • $S$ is the number of participating workers per round.

Simulation Results

  • Numerical examples demonstrate the effectiveness of AgeSel in terms of communication cost and training rounds:

Code Description

  • AgeSel_EMNIST_MC.py:

    • Implements an image classification task on the EMNIST dataset using a two-layer fully connected neural network.
    • Compares AgeSel against state-of-the-art algorithms such as FedAvg, Optimal Client Sampling (OCS), and Round Robin (RR).
    • Results are averaged over 10 Monte Carlo runs for robustness.
  • AgeSel_CIFAR_MC.py:

    • Similar to the above but applied to the CIFAR-10 dataset using a CNN with 2 convolutional layers and 3 fully connected layers.
    • Demonstrates AgeSel's superiority over benchmarks.
  • AgeSel_S.py:

    • Explores the impact of the hyper-parameter $S$ (the number of workers participating per round).

For more details, please refer to our published paper: Springer Link or arXiv.

About

This work is a journal paper published at The Journal of Supercomputing in 2023, focusing on improving communication efficiency of a distributed learning system using age-based worker selection techniques.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages