Skip to content

Q-learning is a model-free reinforcement learning algorithm to learn quality of actions telling an agent what action to take under what circumstances. This study aims to simulate the environment for Q-Learning by importing it from OpenAI gym and to maximize the agent's reward. Baseline hyperparameters will also be used to create a benchmark to u…

Notifications You must be signed in to change notification settings

davidnallapu/QLearningCartPoleOpenAIGym

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

QLearningCartPoleOpenAIGym

Q-learning is a model-free reinforcement learning algorithm to learn quality of actions telling an agent what action to take under what circumstances. This study aims to simulate the environment for Q-Learning by importing it from OpenAI gym and to maximize the agent's reward. Baseline hyperparameters will also be used to create a benchmark to understand the performance of the hyperparameter tuning. This study looks at the impact of changing the learning rate, discount rate, epsilon value, and epsilon decay rate. We will see avergae score metrics and a rendered environment to understand how well the agent is performing. We will also see that certain paramerter tuning causes a dramatic difference. The findings from this study indicate that it is important to do hyper parameter tuning for a reinforcement learning environment.

Conclusion

We can conclude that the Q-learning algorithm is a great way to tackle the problem i.e to balance the Cart Pole. We were getting good rewards but there is a definite need of improvement. We can see that a low learning rate is good as there are not too many updates to the Q table. For this situation, the discount factor being high seems to do the best. This is because intermediary states here will be the tipping of the cart pole and this information is important. Also, decaying the epsilon value gives a good exploration-exploitation tradeoff. Stability seems to be a huge factor from the graphs. Experience Replay and Deep Learning could be used to get better and more stable results.

About

Q-learning is a model-free reinforcement learning algorithm to learn quality of actions telling an agent what action to take under what circumstances. This study aims to simulate the environment for Q-Learning by importing it from OpenAI gym and to maximize the agent's reward. Baseline hyperparameters will also be used to create a benchmark to u…

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published