01. Fundamentals of Reinforcement Learning
02. A Guide to the Gym Toolkit
03. Bellman Equation and Dynamic Programming
05. Understanding Temporal Difference Learning
06. Case Study: The MAB Problem
07. Deep learning foundations
08. A primer on TensorFlow
09. Deep Q Network and its Variants
10. Policy Gradient Method
10.07. Cart Pole Balancing with Policy Gradient.ipynb
11. Actor Critic Methods - A2C and A3C
12. Learning DDPG, TD3 and SAC
13. TRPO, PPO and ACKTR Methods
14. Distributional Reinforcement Learning
15. Imitation Learning and Inverse RL
16. Deep Reinforcement Learning with Stable Baselines
17. Reinforcement Learning Frontiers
Folders and files Name Name Last commit message
Last commit date
parent directory
View all files
10. Policy Gradient Method
10.1. Why Policy Based Methods?
10.2. Policy Gradient Intuition
10.3. Understanding the Policy Gradient
10.4. Deriving Policy Gradient
10.4.1. Algorithm - Policy Gradient
10.5. Variance Reduction Methods
10.6. Policy Gradient with Reward-to-go
10.6.1. Algorithm - Reward-to-go Policy Gradient
10.7. Cart Pole Balancing with Policy Gradient
10.8. Policy Gradient with Baseline
10.8.1. Algorithm - Reinforce with Baseline
You can’t perform that action at this time.