Dynamic Table

About

This study explores the application of deep reinforcement learning to the 'Dynamic Table' model, designed for efficient manipulation of objects. Reinforcement learning, a field that has received significant attention within machine learning and AI, enables an agent to learn optimal strategies for sequential decision-making tasks through trial and error. The Dynamic Table, trained using these advanced techniques, shows promise in various domains, including logistics, industrial production, gaming, andhome automation. The model's performance was thoroughly assessed, demonstratingrobustness in object placement and movement capabilities within constrainedenvironments. The results underline the potential of the model to improve robotic manipulation tasks, laying a solid foundation for future research and practical implementations. Further studies could extend these findings to more complex real- world scenarios, emphasizing the practical applications of the Dynamic Table model in diverse fields.

Usage

Prerequisites

Download ML Agents package from Unity Package Manager.

Create a conda (with python version 3.9.13) venv.

Install requirements.txt.

Upgrade and add packages to conda venv:

pip install --upgrade setuptools pip wheel pip install nvidia-pyindex conda install cuda -c nvidia/label/cuda-11.8.0

Download Cuda11.8 and Visual Studio (I use 2019. 2022 does not work.).

To test cuda, open cmd, type python, import torch, print(torch.cuda.is_available()), expect true

Test the mlagents by writing on cmd mlagents-learn -h after activating the environment.

Training and Running Models

Open cmd and activate the venv. Type mlagents-learn --run-id TestID. Start training by pressing the Play button in the Unity Editor. Or just press the Play button in the Unity Editor to use heuristics.

Files and Folders

Thesis.pdf: Thesis paper, including the references.
Notes: Nots taken during the process.
results: Training results including onnx models.

Methods

Dyanmic Table

Dynamic Table has a structure consisting of different numbers of rectangular prism parts that can move up and down independently of each other.

Unity ML-Agents

ML-Agents can be considered one of the new researches in this matter. It is a toolkit that was first released in 2017. ML-Agents was released to create a suitable environment for researchers and developers to transform games and simulations into environments where intelligent agents can be trained with the help of machine learning algorithms.

PPO Learning

Proximal policy optimization, or PPO for short, is a reinforcement learning algorithm used in agent training and is classified as a policy gradient method. The algorithm was introduced by OpenAI in 2017 and now can be considered state-of-the- art in Reinforcement Learning Algorithms, in OpenAI's words. The biggest reasonwhy the PPO algorithm has achieved such good performance is that the previous algorithms were not scalable, inefficient with data, or too complicated. Meanwhile, PPO design focuses on 3 things: simple implementation, sample efficiency, and stability.

A* Path Finding

A* is a graph traversal and path-finding algorithm which is used to find the shortest path in graph-based search problems. It searches for the path with the lowest total cost by calculating the cost to the current node called g cost and the estimatedcost from the current node to the destination called h cost.

Resulst and Observations

A comparative analysis of the different model versions reveals key improvements and areas for further enhancement. The table above highlights the differences between versions in terms of performance and features. Notably, versions 078 and 090 demonstrate the highest test points, indicating superior performance interms of accuracy. However, the importance of version 090 among 078 is explainedwith the other key observations below. Observations There is a positive relationship between longer episode lengths and a higher number of completed episodes, which tends to result in improved performance measures.

For example, versions that have episode lengths of approximately 50 andhave completed millions of episodes demonstrate strong performance. The presence of a configuration consisting of 64 units and 3 hidden layers is often observed in the high-performing versions, suggesting that it represents an optimal balance for the model's architecture. Comparing 090 and 078 Version 078 obtained a test score of around 98.7, surpassing all other tested versions. This indicates significant improvements in managing complex tasks usingthe dynamic table model. Version 090 demonstrated outstanding performance, achieving a test score of approximately 98. It consistently performed effectively across many setups. In contrast, to achieve the difference between the versions 078 and 090, It is important to note that the operational mechanisms of maze systems, with the exception of the 090 version, are as follows: The occurrences of these events are unpredictable, and their levels of difficulty fluctuate dynamically throughout the training process. As the model achieves greater success, its challenges multiply. The increase in complexity is accomplished by growing the quantity of randomly generated walls.

Various iterations of mazes have been attempted, with detailed modifications to the difficulty level in each iteration. However, none of these attempts had successful outcomes. When comparing the 090 and 078 versions, the 078 version achieved a higher score and also facilitated faster product delivery. However, in contrast to the 090 version, the other versions lacked the use of an A* algorithm, which made theminsufficient in situations when there was an obstacle blocking the path between the product and the target. As a result, these versions rarely achieved a successful arrival of the product to the target. Despite scoring 88.5, even the 076 version failed to achieve adequate outcomes when trained for these scenarios. Among models trainedfor the maze system, it achieved the highest score of 45.6, second only to the 090 version. The primary factor that significantly influenced this score, beyond initial expectations, was an error in the adjustment of maze difficulty. Specifically, the difficulty wasn't increasing at an enough rapid rate, whereas the main issue was that it 17 decreased too swiftly. The 090 version successfully achieves the desired results inboth pre-prepared mazes and

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Additional Docs and Personal Files		Additional Docs and Personal Files
Assets		Assets
Notes		Notes
Packages		Packages
ProjectSettings		ProjectSettings
results		results
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Thesis.pdf		Thesis.pdf
desktop.ini		desktop.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dynamic Table

Table of Contents

About

Usage

Prerequisites

Training and Running Models

Files and Folders

Methods

Dyanmic Table

Unity ML-Agents

PPO Learning

A* Path Finding

Resulst and Observations

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

elymsyr/Thesis-Dynamic-Table

Folders and files

Latest commit

History

Repository files navigation

Dynamic Table

Table of Contents

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages