This repository contains three different implementations of UAV-enabled NOMA (Non-Orthogonal Multiple Access) communications using Shared Deep Q-Networks, each with a different approach to action masking:
This paper has been accepted in IEEE Transactions on Machine Learning in Communicaitons and Networking. Please use the following citation while the paper:
@article{rizvi2023multi, title={Multi-agent reinforcement learning with action masking for uav-enabled mobile communications}, author={Rizvi, Danish and Boyle, David}, journal={arXiv preprint arXiv:2303.16737}, year={2023} }
This block would be updated with citation for the published version in due course.
Located in basic masking/
- Simple post-prediction action masking using -inf values
- Basic DQN architecture
- Direct Q-value masking after prediction
Located in IAM1/
- Action masking based on paper approach
- Uses -inf masking after Q-value prediction
- Enhanced state representation
- Modified power allocation schemes
Located in IAM2/
-
Action masking integrated into neural network architecture
-
Uses binary (0/1) masking through Lambda layer
-
Dual input network (state and mask)
-
Masking influences training process directly
Located in MDQN clustering/
- Based on the paper "Multi-Agent Reinforcement Learning in NOMA-Aided UAV Networks for Cellular Offloading" for benchmarking
- No Action masking
- Fixed clustering
- Basic: Post-prediction -inf masking
- IAM1: Enhanced post-prediction masking with improved state handling
- IAM2: Integrated masking in network architecture
- Basic: Single input (state) network
- IAM1: Single input with enhanced state representation
- IAM2: Dual input (state and mask) network
- Basic: Masking only affects action selection
- IAM1: Masking affects Q-value updates
- IAM2: Masking is part of the training process
Each implementation follows the same structure:
implementation-folder/
├── src/
│ ├── config/
│ │ └── parameters.ipynb # (or config file)System parameters
│ ├── models/
│ │ ├── action_masking.ipynb # Action masking implementation
│ │ ├── dqn..ipynb # DQN implementation
│ │ └── system_model.ipynb # UAV-NOMA system model
│ └── utils/
│ └── visualization..ipynb # Plotting functions (if present)
- Python 3.8+
- TensorFlow 2.x
- NumPy
- Pandas
- Matplotlib
- Scikit-learn
- Clone the repository:
git clone https://github.com/smrizvi1/Multi-Agent-RL-with-Invalid-action-Masking.git
- Install required packages:
pip install -r Requirements.txt
- Multiple UAV trajectory optimization
- NOMA power allocation
- Dynamic user clustering
- Channel state aware resource allocation
- Experience replay
- Target network
- Epsilon-greedy exploration
- Action masking for invalid actions
- Episodes with multiple time steps
- Dynamic user movement
- Periodic user clustering
- Performance tracking
Each implementation generates:
- Throughput plots
- Worst-user rate plots
- UAV trajectories
- Training metrics
Results are saved as:
- Data:
.npy
files - Plots:
.png
files