First, install the required packages.
pip install -r requirements.txt
We train and evaluate the gridworld agent in both (fenced) cliff environment by running the following experiments.
python -m src.gridworld.evaluation constrained_gridworld_value configured default # 1
python -m src.gridworld.evaluation constrained_gridworld_entropy configured default # 2
python -m src.gridworld.evaluation unconstrained_gridworld_value configured default # 3
python -m src.gridworld.evaluation delta_as_function_of_failure_penalty configured default # 4
python -m src.gridworld.evaluation safety_as_function_of_failure_alpha configured default # 5
To train a constraints-penalized, SAC Pendulum model, run the following command.
python -m src.pendulum.training --alpha={alpha} --seed={seed}
where {alpha}
is the temperature parameter and {seed}
is the random seed.
Extract the best-performing model and store it at
checkpoints_pendulum/PenalizedPendulumEnvironment__{seed}__{alpha}__{timestamp}/best-model.pth
.
In our experiments, we used alpha={0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0}
and seeds ranging from 1 to 25.
We evaluate the trained models by running the following experiments.
python -m src.pendulum.evaluation evaluate_mode_policies configured default # 6
python -m src.pendulum.evaluation evaluate_disturbed_mode_policies configured default # 7
The results are automatically stored to results/pendulum
.
To train a constraints-penalized, SAC Hopper model, run the following command.
python -m src.hopper.training --alpha={alpha} --seed={seed}
where {alpha}
is the temperature parameter and {seed}
is the random seed.
Extract the best-performing model and store it at
checkpoints_pendulum/PenalizedHopperEnvironment__{seed}__{alpha}__{timestamp}/best-model.pth
.
In our experiments, we used alpha={0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0}
and seeds ranging from 1 to 25.
We evaluate the trained models by running the following experiments.
python -m src.hopper.evaluation evaluate_mode_policies configured default # 8
python -m src.hopper.evaluation evaluate_disturbed_mode_policies configured default # 9
The results are automatically stored to results/hopper
.
Path. results/gridworld/constrained_gridworld_value/{timestamp}/plots/plot_values_constrained.pdf
Path. results/gridworld/unconstrained_gridworld_entropy/{timestamp}/plots/plot_value_grid_unconstrained.pdf
python -m src.pendulum.analyze_disturbed_mode_success_rates configured small # 10
python -m src.hopper.analyze_disturbed_mode_success_rates configured small # 11
python -m src.figures.figure_3 --run_pendulum_success_rate={timestamp result 10} run_hopper_success_rate={timestamp result 11}
Path. results/figure_3/{timestamp}/plots/plot_figure_3.pdf
Path. results/gridworld/constrained_gridworld_entropy/{timestamp}/plots/plot_entropy_constrained/alpha_4_small.pdf
python -m src.figures.figure_5 --run_name_delta={timestamp result 4} --run_name_safety={timestamp result 5}
Path. results/figure_5/{timestamp}/plots/plot_figure_5.pdf
python -m src.pendulum.evaluation analyze_mode_environment_returns --run_name={timestamp result 6} --height=2 --width=2.75 # 12
python -m src.hopper.evaluation analyze_mode_environment_returns --run_name={timestamp result 8} --height=2 --width=2.75 # 13
python -m src.figures.figure_6 --run_pendulum_environment_return={timestamp result 12} --run_hopper_environment_return={timestamp result 13}
Path. results/figure_6/{timestamp}/plots/plot_figure_6.pdf
python -m src.pendulum.evaluation analyze_mode_environment_returns configured full # 14
Path. results/pendulum/analyze_mode_environment_returns/{timestamp}/plots/heatmap_disturbed_success_rate.pdf
python -m src.hopper.evaluation analyze_mode_environment_returns configured full # 15
Path. results/hopper/analyze_mode_environment_returns/{timestamp}/plots/heatmap_disturbed_success_rate.pdf