This page has been created to provide the source code for the Reinforcement Learning for Developers book. Readers are encouraged to download the code and use it to enhance their learning experience. The code provided complements the content of the book, offering practical examples and hands-on exercises that will help deepen your understanding of reinforcement learning concepts and algorithms.
python3.7.7
tensorflow2.2
numpy1.19.3
pip install tensorflow==2.2
pip unstall numpy
pip install numpy==1.19.3
pip uninstall protobuf
In recent package installations, it is common to encounter compatibility issues with the protobuf version. As a result, it may be necessary to install a lower version to ensure proper functionality and avoid conflicts.
pip install protobuf==3.20
pip install gym==0.2
1. Getting Started
2. Basic Concepts of Reinforcement Learning
2.1 What is Reinforcement Learning?
2.2 Probability and Stochastic Processes
2.2.1 Probability
2.2.2 Conditional Probability
2.2.3 Stochastic Process
2.3. Markov Chain
2.3.1 Markov Property
2.3.2 Markov Chain
2.4 Markov Reward Process (MRP)
3. Basic Algorithms of Reinforcement Learning
3.1 Markov Decision Process (MDP) Concept
3.2 MDP Action-Value Function
3.3 Optimal Value Function of MDP
3.4 Terminology Used in Reinforcement Learning
3.4.1 Policy Evaluation and Policy Control
3.4.2 Model-Based and Model-Free
3.5 Dynamic Programming
3.6 Monte Carlo Method (MC: Monte-Carlo Method)
3.7 TD (Temporal Difference Learning) and SARSA
3.7.1 TD
3.7.2 SARSA
3.8 Q-Learning
3.8.1 On-Policy and Off-Policy
3.8.2 Importance Sampling
3.8.3 Q-Learning
4. Artificial Intelligence Concepts
4.1 Machine Learning
4.2 Linear Regression Analysis
4.3 Classification Analysis
4.4 Deep Learning
4.5 Setting Up the Development Environment
4.6 TensorFlow
5. Function Approximation
5.1 Derivatives (Differentiation)
5.2 Partial Derivative
5.3 Scalar and Vector
5.4 Gradient
5.5 Gradient Descent
5.6 Stochastic Gradient Descent (SGD)
5.7 Notations for Partial Derivative and Gradient Descent in Reinforcement Learning
5.8 Function Approximation
6. Value-Based Reinforcement Learning and DQN Algorithm
6.1 DQN Algorithm
6.2 Cartpole
6.3 Exploration and Exploitation
6.4 Basic Structure of the DQN Algorithm
6.5 Reviewing the Entire DQN Algorithm Code
6.6 Detailed Structure of the DQN Algorithm
6.7 Analysis of DQN Algorithm Training Results
7. Policy-Based Reinforcement Learning: REINFORCE Algorithm
7.1 Revisiting Neural Networks
7.2 Policy Gradient
7.3 REINFORCE Algorithm Operation
7.4 Basic Structure of the REINFORCE Algorithm
7.5 Reviewing the Complete REINFORCE Algorithm Code
7.6 Exploring the Detailed Structure of the REINFORCE Algorithm
7.7 Analysis of REINFORCE Algorithm Training Results
8. Policy-Based A2C Algorithm
8.1 Actor-Critic Algorithm
8.2 Advantage Actor-Critic (A2C)
8.3 Basic Structure of the A2C Algorithm
8.4 Full Code Review of the A2C Algorithm
8.5 Examining the Detailed Structure of the A2C Algorithm
8.6 Analyzing A2C Algorithm Training Results
9. Policy-Based PPO Algorithm
9.1 Importance Sampling
9.2 Off-Policy Policy Gradient
9.3 Clipping Technique
9.4 Generalized Advantage Estimation (GAE)
9.5 Basic Structure of the PPO Algorithm
9.6 PPO Algorithm Full Code Review
9.7 Examining the Detailed Structure of the PPO Algorithm
9.8 Analyzing PPO Algorithm Training Results
10. Neural Network Tuning
10.1 Overview of Neural Network Tuning
10.2 Input Data Preprocessing
10.3 Choosing a Cost Function
10.4 Activation Algorithms
10.5 Weight Initialization
10.6 Optimization Algorithms
10.7 Discussion on the Number of Nodes and Hidden Layers
10.8 Tuning the Neural Network in the PPO Algorithm
10.9 Applying Tuning Code to PPO Algorithm
10.10 Analysis of PPO Algorithm Tuning Results
11. Grid Search-Based Optimization Technique
11.1 Concept of Grid Search
11.2 Coding Grid Search
11.3 Full Grid Search Code
11.4 Grid Search Execution Results
11.5 Applying Grid Search Parameter Tuning
12. Bayesian Optimization Technique
12.1 Frequentist Probability and Bayesian Probability
12.2 Calculating Bayesian Probability
12.3 Introduction to Bayesian Optimization Packages
12.4 Using the Bayesian Optimization Package
12.5 Complete Bayesian Optimization Code
12.6 Analyzing Bayesian Optimization Results
13. In Conclusion