Skip to content

multicore-it/rl-kdp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

Welcome to the Source Code for "Reinforcement Learning for Developers"

This page has been created to provide the source code for the Reinforcement Learning for Developers book. Readers are encouraged to download the code and use it to enhance their learning experience. The code provided complements the content of the book, offering practical examples and hands-on exercises that will help deepen your understanding of reinforcement learning concepts and algorithms.

Download the code below and get started with applying the concepts from the book!

The program versions used in the code are as follows:

python3.7.7

tensorflow2.2

numpy1.19.3

install

pip install tensorflow==2.2

pip unstall numpy

pip install numpy==1.19.3

pip uninstall protobuf

fix error

In recent package installations, it is common to encounter compatibility issues with the protobuf version. As a result, it may be necessary to install a lower version to ensure proper functionality and avoid conflicts.

pip install protobuf==3.20

pip install gym==0.2

Table of Contents

1. Getting Started

2. Basic Concepts of Reinforcement Learning

2.1 What is Reinforcement Learning?

2.2 Probability and Stochastic Processes

2.2.1 Probability

2.2.2 Conditional Probability

2.2.3 Stochastic Process

2.3. Markov Chain

2.3.1 Markov Property

2.3.2 Markov Chain

2.4 Markov Reward Process (MRP)

3. Basic Algorithms of Reinforcement Learning

3.1 Markov Decision Process (MDP) Concept

3.2 MDP Action-Value Function

3.3 Optimal Value Function of MDP

3.4 Terminology Used in Reinforcement Learning

3.4.1 Policy Evaluation and Policy Control

3.4.2 Model-Based and Model-Free

3.5 Dynamic Programming

3.6 Monte Carlo Method (MC: Monte-Carlo Method)

3.7 TD (Temporal Difference Learning) and SARSA

3.7.1 TD

3.7.2 SARSA

3.8 Q-Learning

3.8.1 On-Policy and Off-Policy

3.8.2 Importance Sampling

3.8.3 Q-Learning

4. Artificial Intelligence Concepts

4.1 Machine Learning

4.2 Linear Regression Analysis

4.3 Classification Analysis

4.4 Deep Learning

4.5 Setting Up the Development Environment

4.6 TensorFlow

5. Function Approximation

5.1 Derivatives (Differentiation)

5.2 Partial Derivative

5.3 Scalar and Vector

5.4 Gradient

5.5 Gradient Descent

5.6 Stochastic Gradient Descent (SGD)

5.7 Notations for Partial Derivative and Gradient Descent in Reinforcement Learning

5.8 Function Approximation

6. Value-Based Reinforcement Learning and DQN Algorithm

6.1 DQN Algorithm

6.2 Cartpole

6.3 Exploration and Exploitation

6.4 Basic Structure of the DQN Algorithm

6.5 Reviewing the Entire DQN Algorithm Code

6.6 Detailed Structure of the DQN Algorithm

6.7 Analysis of DQN Algorithm Training Results

7. Policy-Based Reinforcement Learning: REINFORCE Algorithm

7.1 Revisiting Neural Networks

7.2 Policy Gradient

7.3 REINFORCE Algorithm Operation

7.4 Basic Structure of the REINFORCE Algorithm

7.5 Reviewing the Complete REINFORCE Algorithm Code

7.6 Exploring the Detailed Structure of the REINFORCE Algorithm

7.7 Analysis of REINFORCE Algorithm Training Results

8. Policy-Based A2C Algorithm

8.1 Actor-Critic Algorithm

8.2 Advantage Actor-Critic (A2C)

8.3 Basic Structure of the A2C Algorithm

8.4 Full Code Review of the A2C Algorithm

8.5 Examining the Detailed Structure of the A2C Algorithm

8.6 Analyzing A2C Algorithm Training Results

9. Policy-Based PPO Algorithm

9.1 Importance Sampling

9.2 Off-Policy Policy Gradient

9.3 Clipping Technique

9.4 Generalized Advantage Estimation (GAE)

9.5 Basic Structure of the PPO Algorithm

9.6 PPO Algorithm Full Code Review

9.7 Examining the Detailed Structure of the PPO Algorithm

9.8 Analyzing PPO Algorithm Training Results

10. Neural Network Tuning

10.1 Overview of Neural Network Tuning

10.2 Input Data Preprocessing

10.3 Choosing a Cost Function

10.4 Activation Algorithms

10.5 Weight Initialization

10.6 Optimization Algorithms

10.7 Discussion on the Number of Nodes and Hidden Layers

10.8 Tuning the Neural Network in the PPO Algorithm

10.9 Applying Tuning Code to PPO Algorithm

10.10 Analysis of PPO Algorithm Tuning Results

11. Grid Search-Based Optimization Technique

11.1 Concept of Grid Search

11.2 Coding Grid Search

11.3 Full Grid Search Code

11.4 Grid Search Execution Results

11.5 Applying Grid Search Parameter Tuning

12. Bayesian Optimization Technique

12.1 Frequentist Probability and Bayesian Probability

12.2 Calculating Bayesian Probability

12.3 Introduction to Bayesian Optimization Packages

12.4 Using the Bayesian Optimization Package

12.5 Complete Bayesian Optimization Code

12.6 Analyzing Bayesian Optimization Results

13. In Conclusion

About

source code for reinforcement learning for developers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published