Skip to content

Code for the paper "Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization" (ECML 2025)

Notifications You must be signed in to change notification settings

Data-Science-in-Mechanical-Engineering/entropy_robustness

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Instructions for Reproducing Results

First, install the required packages.

pip install -r requirements.txt

Gridworld Experiments

We train and evaluate the gridworld agent in both (fenced) cliff environment by running the following experiments.

python -m src.gridworld.evaluation constrained_gridworld_value configured default  # 1
python -m src.gridworld.evaluation constrained_gridworld_entropy configured default  # 2
python -m src.gridworld.evaluation unconstrained_gridworld_value configured default  # 3
python -m src.gridworld.evaluation delta_as_function_of_failure_penalty configured default  # 4
python -m src.gridworld.evaluation safety_as_function_of_failure_alpha configured default  # 5

Pendulum Experiments

To train a constraints-penalized, SAC Pendulum model, run the following command.

python -m src.pendulum.training --alpha={alpha} --seed={seed}

where {alpha} is the temperature parameter and {seed} is the random seed. Extract the best-performing model and store it at checkpoints_pendulum/PenalizedPendulumEnvironment__{seed}__{alpha}__{timestamp}/best-model.pth.

In our experiments, we used alpha={0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0} and seeds ranging from 1 to 25. We evaluate the trained models by running the following experiments.

python -m src.pendulum.evaluation evaluate_mode_policies configured default  # 6
python -m src.pendulum.evaluation evaluate_disturbed_mode_policies configured default  # 7

The results are automatically stored to results/pendulum.

Hopper Experiments

To train a constraints-penalized, SAC Hopper model, run the following command.

python -m src.hopper.training --alpha={alpha} --seed={seed}

where {alpha} is the temperature parameter and {seed} is the random seed. Extract the best-performing model and store it at checkpoints_pendulum/PenalizedHopperEnvironment__{seed}__{alpha}__{timestamp}/best-model.pth.

In our experiments, we used alpha={0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0} and seeds ranging from 1 to 25. We evaluate the trained models by running the following experiments.

python -m src.hopper.evaluation evaluate_mode_policies configured default  # 8
python -m src.hopper.evaluation evaluate_disturbed_mode_policies configured default  # 9

The results are automatically stored to results/hopper.

Figures

Figure 1

Path. results/gridworld/constrained_gridworld_value/{timestamp}/plots/plot_values_constrained.pdf

Figure 2

Path. results/gridworld/unconstrained_gridworld_entropy/{timestamp}/plots/plot_value_grid_unconstrained.pdf

Figure 3

python -m src.pendulum.analyze_disturbed_mode_success_rates configured small  # 10
python -m src.hopper.analyze_disturbed_mode_success_rates configured small  # 11
python -m src.figures.figure_3 --run_pendulum_success_rate={timestamp result 10} run_hopper_success_rate={timestamp result 11}

Path. results/figure_3/{timestamp}/plots/plot_figure_3.pdf

Figure 4

Path. results/gridworld/constrained_gridworld_entropy/{timestamp}/plots/plot_entropy_constrained/alpha_4_small.pdf

Figure 5

python -m src.figures.figure_5 --run_name_delta={timestamp result 4} --run_name_safety={timestamp result 5}

Path. results/figure_5/{timestamp}/plots/plot_figure_5.pdf

Figure 6

python -m src.pendulum.evaluation analyze_mode_environment_returns --run_name={timestamp result 6}  --height=2 --width=2.75  # 12
python -m src.hopper.evaluation analyze_mode_environment_returns --run_name={timestamp result 8}  --height=2 --width=2.75  # 13
python -m src.figures.figure_6 --run_pendulum_environment_return={timestamp result 12} --run_hopper_environment_return={timestamp result 13}

Path. results/figure_6/{timestamp}/plots/plot_figure_6.pdf

Figure 7

python -m src.pendulum.evaluation analyze_mode_environment_returns configured full # 14

Path. results/pendulum/analyze_mode_environment_returns/{timestamp}/plots/heatmap_disturbed_success_rate.pdf

Figure 8

python -m src.hopper.evaluation analyze_mode_environment_returns configured full # 15

Path. results/hopper/analyze_mode_environment_returns/{timestamp}/plots/heatmap_disturbed_success_rate.pdf

About

Code for the paper "Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization" (ECML 2025)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages