NOTE: This repo is the for the code for LongFed - a framework for client selection in FL clusters. This is for benchmarking 'SwaFL' - a novel client selection framework for heterogenous, low-resourced FL clusters.
- Overview
- System Architecture
- Components
- Installation
- Configuration
- Detailed Component Documentation
- Metrics and Monitoring
- Usage Examples
This federated learning system implements a distributed machine learning framework where multiple clients collaborate to train a global model while keeping their data private. The system uses a CNN model for image classification on the CIFAR-10 dataset.
- Selective client participation based on LongFed.
- Resource monitoring and performance tracking
- Adaptive sampling using Dirichlet distribution
- Comprehensive metrics logging
- Fault tolerance and error handling
- Central coordinator for federated learning rounds
- Manages client registration and participation
- Aggregates model updates using FedAvg algorithm
- Evaluates global model performance
- Train local models on private data
- Make intelligent participation decisions
- Monitor system resources
- Track performance metrics
The client implementation contains several key classes and methods:
Key responsibilities:
- Initializes connection with server
- Creates and manages local model
- Handles client configuration
- Sets up monitoring
Handles:
- Local model training
- Resource monitoring during training
- Performance metric collection
Features:
- Implements Dirichlet distribution sampling
- Ensures balanced class representation
- Handles random data selection
Features:
- Real-time CPU and memory monitoring
- Metric logging to CSV files
- Thread-safe resource tracking
- Comprehensive metric collection
System Metrics:
- CPU utilization
- Memory usage
- Training time
- Model accuracy
Parameters:
dirichlet_alpha
: Controls data distribution skewnessparticipation_threshold
: Percentage of clients that should participatemin_clients
: Minimum required participating clientsrounds
: Total training rounds
Implements sophisticated data sampling:
-
Dirichlet Distribution:
- Controls class distribution across clients
- Configurable via alpha parameter
- Ensures realistic non-IID scenarios
-
Adaptive Sampling:
- 50% data sampling per round
- Class-aware selection
- Balance maintenance
Comprehensive metric tracking:
-
Real-time Monitoring:
- Thread-based resource tracking
- Configurable sampling frequency
- Non-blocking implementation
-
Metric Storage:
- CSV-based logging
- Timestamp-based tracking
- Participation history
from client import FederatedClient
client = FederatedClient(client_id=1, server_url='http://127.0.0.1:5000')
client.start_client()
- Install the required packages
```bash
pip install -r requirements.txt
```
- Configure the config.json file
- Run the server.py file
```python
python server.py
```
- Run the test_federated.py file
```python
python test_federated.py
```
-
Resource Management:
- Single CPU core utilization
- Controlled memory usage
- Thread-safe operations
-
Error Handling:
- Graceful failure recovery
- Metric logging persistence
- Connection retry logic
-
Scalability:
- Asynchronous client operations
- Efficient resource monitoring
- Minimal memory footprint
This system provides a robust, scalable, and monitored federated learning implementation with intelligent participation decisions and comprehensive metric tracking.