This github project uses Reinforcement Learning (RL) to train an agent to provide security within a network scenario. The network scenario follows Use Case 1, i.e. Security of a Drinking Water Control System, from Deliverable 1 of the DSTL project ``Intelligent Asset Parameterisation for Risk-based Moving Target Defence" or RL-MTD for short. The RL agent aims to keep the risk on the network low.
The agent is trained to deploy security controls to protect the network from Red Agent (RA) attacks. It considers both the cost and the efficacy of the security controls as it learns its policy.
This system is designed to take input from Mulval attack graphs. Mulval is used to create a graph (using Networkx) with nodes and edges indicating an attackers state and how would he/she would proceed to reach a target. Some Mulval nodes have controls that can be deployed to prevent the attacker from getting to the target.
The controls have a cost and an efficacy associated with them. The cost is the amount of resources it takes to deploy the control. The efficacy is the amount of protection the control provides (ranges from 0 to 1).
Controls are deployed by the defender (Blue Agent, or BA for short) who has a budget to use when deploying controls. The defender can also choose to not deploy any controls (e.g. if the risk is low). The defender's goal is to to survive RA attacks for a specified period of time (timesteps), without exceeding the budget and without having the high value target comproimised by the RA.
The actions of a RA are generated by an algorithm that mimics what an attacker would do. For deployment of the trained agent, the actions of the attacker would be read from parsing Wazuh log files (Wazuh is a known Host Intrusion Detection System). The required information would be what actions the attacker has conducted in regards to the Mulval attack graph.
The environment is developed based on OpenAI's gym. It is made up of:
- State: The Mulval nodes with controls that have been exploited by the attacker. All are initialized to 0 and change to 1 when exploited.
- Actions: The security controls that can be deployed by the defender. The defender can choose to deploy a control or not deploy any controls.
- Reward: The reward is calculated based on the respective cost of deployed control and the resulting risk in the system, with a lower risk leading to a higher reward.
- Done (terminal state): The episode ends with the RL agent winning, if it blocks the attacker for a set number of steps (i.e. attacker did not succeed). The episode ends with a loss when the attacker reaches the target or when the defender's budget is depleted.
The agents are trained using the QLearning algorithm.
Config file includes
loss_reward
, reward for losing, default is set to 0win_reward
, reward for losing, default is set to 0steps_to_win
, maximum number of steps to win per episode, by default is set to 100repeat_action_punishment
, allow punishment for repeating the same action, default is set to 0training_episodes
, the number of episodes for training the agent, default is set to 100,evaluation_episodes, "Evaluation"
, the number of episodes for evaluating the agent, default is set to 10,continue_training
, true - by default is set to True, if this parameter is true means that the agent will get the last saved file of q-tableuse_plotly
, the library for the diagrams, available libraries (MatPlotLib, Plotly) default is set to False, to use MatPlotLibreward
:RiskTh
, risk threshold, by default is 13BT
, budget with a maximum of 4610 for use case 1, by default is 3500W1
, balances the relationship between Cost and Benefit, by default is 0.04W2
, provides a n offset, to shift rewards to positive axis, by default is 0W3
, allows for additional reward in the event of no action take, by default is 10W4
, percentage of risk thresold (RT), by default is 0.75 The above reward weights will be further finetuned as part of the final deliverable
You may additional configuration by changing the experimental parameters.
To run the experimental configuration please change use: defaults
to use: experimental
or vice versa to run the default configuration
A convenient way to set up the necessary packages is to use a virtual environment. Prerequisites
Python 3.10.9
pip 22.3.1
setuptools 65.5.0
- Create a virtual environment and install the requirements:
(Windows) python3 -m venv rlmtd
(Linux) virtualenv rlmtd-venv --pip 22.0.2 --setuptools 63.2.0
- Activate the virtual environment:
(Windows) rlmtd/Scripts/activate
(Linux) source rlmtd/bin/activate
- Install the requirements:
pip install -r requirements.txt