This service is designed to centralize and streamline communication between an agent in reinforcement learning and services on Kind Kubernetes, thereby preventing circular dependencies in software architecture, and making maintenance easier.
All APIs in this repository aim to manage pods running under a service in Kubernetes. This repository is implemented to facilitate our agent in Reinforcement Learning (RL) from RL-Based Autoscaler Repository. These APIs are implemented by Python, using Docker SDK and Kubernetes Client.
A research under a title of Adaptive Horizontal Pod Autoscaling (AHPA) Based on Reinforcement Learning in Kubernetes for Machine Learning
, introducing the Adaptive Horizontal Pod Autoscaler (AHPA), which utilizes RL with a Deep Q-Network (DQN) to dynamically adjust the number of Kubernetes Pods for horizontal scaling, enabling both scaling in and scaling out. We evaluate the performance and reliability of AHPA in image classification tasks, comparing its effectiveness against a traditional horizontal autoscaler in Kubernetes.
This project consists of the following three components, distributed across different repositories, working together seamlessly.
- RL-Based Autoscaler: The main repository for RL-based Adaptive Horizontal Autoscaler (AHPA), implemented by Deep Q-Networks (DQN). It includes an agent and its learning procedure, with an ADAM optimizer set as the default.
- Docker-Manipulation-API: The service facilitates an RL agent by enabling seamless communication between the RL agent and the service running on Kubernetes as part of our research.
- Image-Classification: The target application in our study, image classification, serves image classification application based on user-submitted photos.
An agent in reinforcement learning, supported by auxiliary services, is actively learning from the environment in its setup.
Using Flask, these following APIs are implemented.
-
GET /: This API is a default route for health checks, which simply returns a
hello-world
message. -
GET /pod: This service provides clients with detailed and real-time information about the state of Pods in the system. It includes key performance metrics such as CPU and memory utilization percentages, which help assess the resource consumption and efficiency of each Pod. Additionally, the service reports the current number of active or running Pods, offering insights into the overall health and scale of the deployment.
-
POST /pod/confirm: Similar to the GET /pod API, this API retrieves all relevant information and additionally terminates Pods that are not in a ’running’ status. In our study, the following statuses are considered unsatisfactory:
-
Pending - This status occurs when at least one primary container is initiated successfully, but the Pod is not yet fully operational.
-
CrashLoopBackOff - This indicates the Pod is stuck in a restart loop due to an overloaded or incorrectly requested configuration.
-
ImagePullbackOff - This occurs when a container in the Pod fails to pull the required image from a container registry.
-
Terminating - This status signifies that the Pod is scheduled for deletion but has not yet been fully removed from the node. By terminating Pods in these unsatisfactory states, we ensure that only active and healthy Pods remain.
This approach helps confirm that all operational Pods are ready, thereby facilitating the agent’s ability to learn and perform effectively.
-
POST /pod/scale/in: This API specifies a number of online Pods in the cluster and facilitates scaling in by reducing the number of Pods by one. Scaling-in refers to the process of decreasing the cluster size by terminating one Pod, thereby freeing up resources when demand decreases. The scaling operation is constrained within a defined range, with a minimum of 1 Pod and a maximum of 5 Pods allowed.
-
POST /pod/scale/out: Similar to the POST /pod/scale/in API, this API specifies a number of online Pods in the cluster and facilitates scaling by adding one additional Pod. The scaling operation is designed to increase the number of Pods incrementally, with a limit on the number of Pods that can be scaled within the cluster. Specifically, the scaling is constrained within a range of 1 to 5 Pods, ensuring that the cluster remains within manageable limits.
-
GET /app/stat: Similar to the previous API, GET /pod, this API retrieves all relevant Pod information while also performing service testing, specifically following a scaling event. After gathering the necessary data, this API executes the Reward Feedback JMeter file. The JMeter file simulates traffic and captures performance metrics, which are then used to evaluate the system’s responsiveness post-scaling. Subsequently, the API calculates key performance indicators, specifically the average latency and packet drop percentage, which are crucial for RL reward computations. These metrics provide essential feedback that helps refine the agent’s decision- making process, guiding it toward optimal scaling actions.
-
POST /pod/set-pod-count/<pod_count>: This API has been implemented to enable administrators to manually configure the number of pods in the system. This feature is designed to provide a quick and effective solution for addressing unexpected issues or resource imbalances, ensuring system stability and performance. By allowing administrators to adjust pod counts as needed, this API helps maintain control over the system's scaling behavior, especially during times when automated scaling might not respond adequately to certain challenges or disruptions.
-
POST /metrics: This API collect metrics from the target application inlcuding CPU percentage, memory, online pods and total pods at a moment every, approximaltly, 0.2 seconds. This route has been specifically designed and implemented to provide dedicated support for thesis writing and the subsequent analysis of results.
To start and test the service, follow the below instruction.
-
Installing Python and Dependencies This repository is developed by Python. So, install Python in your working environment.
-
Our repository applies virtual environment development, Pipenv. By using this, you can simply install pipenv and create your working environment. Then install all dependencies using
pipenv install
orpipenv sync
.
To run this service, go to a subfolder named server
, and run the service by flask using the following command.
cd .\server\
flask run -p 6000
You can simply test if your service is running success fully by opening a browser with the following website http://localhost:6000/. If the service is running, you should see Hello, World!
in the browser.
For further testing the service, use your prefer choices of API testing platforms, for example, Postman. Then, use the above information for routes to make requests, for example, http://localhost:6000/pod.
If you have any suggestion that would make our website looks better or more convenience, please fork the repo and create a merge requeste. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thank you again!
- Fork the Project
- Create your Feature Branch
git checkout -b feature/AwesomeFeature
- Commit your Changes
git commit -m 'Add some AwesomeFeature'
- Push to the Branch
git push origin feature/AwesomeFeature
- Open a Pull Request
The authors would like to express our sincere gratitude to Dr. habil. Julien Vitay, thesis supervisor from the professorship of Artificial Intelligence (Informatik) at Technische Universitat at Chemnitz, for his expert guidance, unwavering support, and valuable feedback throughout the research and writing process.
We also wish to express our heartfelt appreciation to M.Sc. Florian Zimmer, our research mentor and project advisor from Fraunhofer-Institut fur Software- und Systemtechnik (ISST). His generous investment of time and effort in providing regular, detailed feedback at every stage of the project was invaluable. Additionally, his insightful advice and guidance were crucial in helping us navigate and overcome the challenges encountered throughout this study.
Importantly, we would like to gratefully acknowledge the computing time made available to them on the high-performance computer Barnard and Alpha at the, Nationales Hochleistungsrechnen, NHR Center, at Zentrum f¨ur Informationsdienste und Hochleistungsrechnen (ZIH), at Technische Universit¨at Dresden. This center is jointly supported by the Federal Ministry of Education and Research and the state governments participating in the NHR.
This project is exclusively contributed by Natnicha Rodtong. For inquiries, feel free to contact me via ResearchGate or email.
This repository is a component of a master's thesis titled Adaptive Horizontal Pod Autoscaling (AHPA) Based on Reinforcement Learning in Kubernetes for Machine Learning
. The thesis explores advanced techniques for improving the scalability and efficiency of machine learning workloads in Kubernetes environments using reinforcement learning-based approaches for adaptive horizontal pod autoscaling. The research was conducted at Laboratory of Artificial Intelligence, Technische Universität Chemnitz (TU Chemnitz), Germany, as part of the requirements for completing the Master’s program.
- No Warranty: This project is provided "as is," without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose, or non-infringement.
- Limitation of Liability: The authors or contributors shall not be held liable for any claim, damages, or other liability arising from the use, misuse, or inability to use the content within this repository.
- Third-Party Dependencies: This repository may rely on external libraries or tools that are subject to their own licenses. Please ensure compliance with those licenses when using this project.