GitHub - GrMrWb/RL-MTD

RL-MTD

This github project uses Reinforcement Learning (RL) to train an agent to provide security within a network scenario. The network scenario follows Use Case 1, i.e. Security of a Drinking Water Control System, from Deliverable 1 of the DSTL project ``Intelligent Asset Parameterisation for Risk-based Moving Target Defence" or RL-MTD for short. The RL agent aims to keep the risk on the network low.

The agent is trained to deploy security controls to protect the network from Red Agent (RA) attacks. It considers both the cost and the efficacy of the security controls as it learns its policy.

Inputs

This system is designed to take input from Mulval attack graphs. Mulval is used to create a graph (using Networkx) with nodes and edges indicating an attackers state and how would he/she would proceed to reach a target. Some Mulval nodes have controls that can be deployed to prevent the attacker from getting to the target.

The controls have a cost and an efficacy associated with them. The cost is the amount of resources it takes to deploy the control. The efficacy is the amount of protection the control provides (ranges from 0 to 1).

Controls are deployed by the defender (Blue Agent, or BA for short) who has a budget to use when deploying controls. The defender can also choose to not deploy any controls (e.g. if the risk is low). The defender's goal is to to survive RA attacks for a specified period of time (timesteps), without exceeding the budget and without having the high value target comproimised by the RA.

The actions of a RA are generated by an algorithm that mimics what an attacker would do. For deployment of the trained agent, the actions of the attacker would be read from parsing Wazuh log files (Wazuh is a known Host Intrusion Detection System). The required information would be what actions the attacker has conducted in regards to the Mulval attack graph.

Environment

The environment is developed based on OpenAI's gym. It is made up of:

State: The Mulval nodes with controls that have been exploited by the attacker. All are initialized to 0 and change to 1 when exploited.
Actions: The security controls that can be deployed by the defender. The defender can choose to deploy a control or not deploy any controls.
Reward: The reward is calculated based on the respective cost of deployed control and the resulting risk in the system, with a lower risk leading to a higher reward.
Done (terminal state): The episode ends with the RL agent winning, if it blocks the attacker for a set number of steps (i.e. attacker did not succeed). The episode ends with a loss when the attacker reaches the target or when the defender's budget is depleted.

RL Training

The agents are trained using the QLearning algorithm.

Configuration File

Config file includes

loss_reward, reward for losing, default is set to 0
win_reward, reward for losing, default is set to 0
steps_to_win, maximum number of steps to win per episode, by default is set to 100
repeat_action_punishment, allow punishment for repeating the same action, default is set to 0
training_episodes, the number of episodes for training the agent, default is set to 100,
evaluation_episodes, "Evaluation", the number of episodes for evaluating the agent, default is set to 10,
continue_training, true - by default is set to True, if this parameter is true means that the agent will get the last saved file of q-table
use_plotly, the library for the diagrams, available libraries (MatPlotLib, Plotly) default is set to False, to use MatPlotLib
reward:
- RiskTh, risk threshold, by default is 13
- BT, budget with a maximum of 4610 for use case 1, by default is 3500
- W1, balances the relationship between Cost and Benefit, by default is 0.04
- W2, provides a n offset, to shift rewards to positive axis, by default is 0
- W3, allows for additional reward in the event of no action take, by default is 10
- W4, percentage of risk thresold (RT), by default is 0.75 The above reward weights will be further finetuned as part of the final deliverable

You may additional configuration by changing the experimental parameters.

To run the experimental configuration please change use: defaults to use: experimental or vice versa to run the default configuration

Requirements

A convenient way to set up the necessary packages is to use a virtual environment. Prerequisites

Python 3.10.9
pip 22.3.1
setuptools 65.5.0

Create a virtual environment and install the requirements:

(Windows) python3 -m venv rlmtd 
(Linux) virtualenv rlmtd-venv --pip 22.0.2 --setuptools 63.2.0

Activate the virtual environment:

(Windows) rlmtd/Scripts/activate
(Linux) source rlmtd/bin/activate

Install the requirements:

    pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 399 Commits
.vscode		.vscode
old_src		old_src
results		results
src		src
.gitignore		.gitignore
README.md		README.md
config.yaml		config.yaml
main.py		main.py
main_seed.py		main_seed.py
q_table.pickle		q_table.pickle
q_table_10k_v1.pickle		q_table_10k_v1.pickle
requirements.txt		requirements.txt
tutorial.ipynb		tutorial.ipynb
visualisation.ipynb		visualisation.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RL-MTD

Inputs

Environment

RL Training

Configuration File

Requirements

About

Uh oh!

Releases

Packages

Contributors 7

Uh oh!

Languages

GrMrWb/RL-MTD

Folders and files

Latest commit

History

Repository files navigation

RL-MTD

Inputs

Environment

RL Training

Configuration File

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Uh oh!

Languages

Packages