About the implementation of advantage function in PPO Agent

I find that the implementation in PPOAgent from line 514 in grid/toy_grid_dag.py: `adv = r + vsp * (1-d) - vs` is only an implementation of the delta term in PPO raw paper. It's not the full term of the advantage function.

Was that a misunderstanding of your code or PPO?
![image](https://user-images.githubusercontent.com/39487018/224649960-a69351c8-4f62-4e6f-9125-7bfbef854d75.png)