Skip to content

Commit 7e4dbbc

Browse files
authored
Post: Resilient Microservices (#132)
* Added-post * small adjustments * deleted generated changes * Added adjustments to the post * revert angular changes * revert angular changes * fixes * Apply suggestions from code review
1 parent c17c786 commit 7e4dbbc

File tree

5 files changed

+210
-0
lines changed

5 files changed

+210
-0
lines changed

content/authors/authors.json

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,14 @@
9999
"avatar": "pablo.png",
100100
"role": "Lead Principal Frontend Engineer - Customer Success"
101101
},
102+
"Sebastian Opacki": {
103+
"avatar": "sebastiano.jpeg",
104+
"role": "Senior Backend Engineer - Customer Success"
105+
},
106+
"Rafał Łukowski": {
107+
"avatar": "rafall.jpg",
108+
"role": "Senior Backend Engineer - Customer Success"
109+
},
102110
"Akshay PK": {
103111
"avatar": "akshaypk.png",
104112
"role": "Lead Principal Systems Engineer - CS Expert Services"

content/authors/avatars/rafall.jpg

328 KB
Loading
2.67 MB
Loading
Loading
Lines changed: 202 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,202 @@
1+
# Resilient Microservices
2+
3+
How can we make our microservice more resilient?
4+
5+
![](assets/res_micro.png)
6+
7+
Authors: Sebastian Opacki, Rafał Łukowski
8+
Date: unpublished
9+
Category: backend
10+
11+
tags: microservices,resillency,timeout,circuit breaker,retry
12+
13+
---
14+
15+
### **Introduction**
16+
Nowadays, most big modern systems are designed based on microservices architecture. However, in some situations, such as a small application, a monolith architecture is not a bad idea,
17+
but then a single error can collapse the entire system. This risk can be reduced by using a microservices architecture, which helps to reduce coupling and allows services to operate independently. As a result, a single error cannot break the entire system.
18+
Using a microservices architecture also allows us to manage specific areas without affecting the whole application.
19+
Does that mean a microservices architecture has no flaws? No, there is nothing for free, we get certain benefits, but we also need to deal with other problems such as failures in microservices.
20+
Microservices applications rely heavily on distributed systems, ensuring resilience becomes a critical aspect of their design and performance.
21+
22+
### **Understanding Microservices Resilience and Resiliency pattern**
23+
Resilience in microservices refers to a system's ability to anticipate and handle dependency failures, such as failures in other system microservices or third-party systems.
24+
In a world where plenty of services talk to each other we need to be prepared for failures that can be caused by various reasons such as service failures, network failures, and so on.
25+
Resiliency patterns in microservices are established mechanisms that enable applications to manage failures, ensuring stability even in complex, distributed systems.
26+
Implementing those patterns can assist developers in minimizing the impact of unexpected errors or excessive load on the system, which in turn can reduce downtimes and improve the overall performance of the application.
27+
28+
### **Common Resiliency Patterns**
29+
It's important to note that achieving resiliency in microservices can be done by implementing the patterns described below at both the application/service level and the infrastructure level, such as in Istio.
30+
If there are plenty of microservices and the same configuration is required for all of them, it's better to use an infrastructure solution rather than implementing patterns for each service.
31+
However, if the patterns need to be implemented only in certain scenarios and not across all services, it's better to choose the service resiliency level method.
32+
33+
## Timeout
34+
35+
Microservices talk to each other, not only to internal API's in the same container or machine, but also to other external dependencies.
36+
When we use synchronous call then we need to be prepared for a scenario when a dependency is not a reachable.
37+
We should always have explicitly declared value of timeout in our configuration. For instance, if we use for synchronous call RestTemplate and Spring then we can configure:
38+
39+
```bash
40+
@Bean
41+
public RestTemplate restTemplate(RestTemplateBuilder restTemplateBuilder) {
42+
return restTemplateBuilder
43+
.setConnectTimeout(Duration.ofSeconds(2000))
44+
.setReadTimeout(Duration.ofSeconds(2000))
45+
.build();
46+
}
47+
```
48+
49+
- ConnectTimeout is the timeout for creating a connection. For instance, you are dealing with unreliable server, you want to wait only few seconds before notifying a end user that "something is wrong".
50+
51+
- ReadTimeout is the timeout when you have a connection, you're blocked on read() and you want to get an exception if the read blocks for more than timeout.
52+
It does not matter if you are using a Java application, the Spring Framework, or a RestTemplate client - it is important to always set a timeout in your application when making synchronous calls to other dependencies.
53+
54+
By having declared timeout we can deal with issues when dependency is unreliable, sometime we just a cut a connection and just a log an error, but in other cases we need to quickly notify end user about the error.
55+
56+
## Learn more:
57+
## https://resilience4j.readme.io/docs/timeout
58+
## https://www.baeldung.com/spring-rest-timeout
59+
60+
### **Circuit Breaker Pattern**
61+
Circuit Breaker Pattern is a crucial mechanism used in a microservices architecture to prevent cascading failures across services.
62+
It detects when a dependency that we are trying to reach is unstable by checking a ratio of success and failed calls to that dependency
63+
The circuit breaker pattern works in three states: closed, open and half-open:
64+
65+
- Closed State
66+
This is the normal operational state when there are no errors encountered during invocation to dependency. In this state, all requests from the client are passed through to the downstream service.
67+
68+
- Open State
69+
When there are too many failed responses from an external dependency, the mechanism changes the state to 'Open'. In this state, requests do not reach the service, and instead, a callback method is called or an error is thrown instantly.
70+
By doing this, we give time for the dependency to recover.
71+
72+
- Half-Open State
73+
After a configured time, the Open State needs to be finished. We then check if our dependency has recovered or still is unhealthy.
74+
In 'Open State' we allow some request to reach the external dependency and base on the ratio of responses we decide if the next state will be 'Open' or 'Closed'.
75+
If the server is responding well, then we change the state to 'Closed', indicating that all is good.
76+
However, if the service is still encountering issues, we transition back to the 'Open' state.
77+
The 'Half-Open' state is a way to gradually test whether the dependency has recovered, before fully re-enabling traffic to the system.
78+
79+
## Example
80+
81+
In this example we use a resilience4j library:
82+
pom.xml
83+
```bash
84+
<dependency>
85+
<groupId>io.github.resilience4j</groupId>
86+
<artifactId>resilience4j-spring-boot2</artifactId>
87+
<version>1.7.0</version>
88+
</dependency>
89+
```
90+
91+
To simplify our code we can use an annotation instead of manually preparing a CircuitBreakerRegistry and CircuitBreakerConfig with code configuration.
92+
Let's assume that we have a method which we would like to wrapp with our circuit breaker mechanism:
93+
```bash
94+
@CircuitBreaker(name = "postRegisterCustomer")
95+
public ResponseEntity<RegisterAppAccountResultDto> postRegisterAppFunction(RegisterAppAccountCommand registerAppAccountCommand) {
96+
var httpEntity = new HttpEntity<>(registerAppAccountCommand, getHeader());
97+
var url = "http://localhost:8080";****
98+
99+
return restTemplate.postForEntity(url, httpEntity, RegisterAppAccountResultDto.class);
100+
}
101+
}
102+
```
103+
## Configuration of circuit breaker logger:
104+
```bash
105+
@Configuration
106+
@Slf4j
107+
public class CircuitBreakerLogger {
108+
109+
@Bean
110+
public RegistryEventConsumer<CircuitBreaker> registryEventConsumer() {
111+
112+
return new RegistryEventConsumer<>() {
113+
@Override
114+
public void onEntryAddedEvent(@NotNull EntryAddedEvent<CircuitBreaker> entryAddedEvent) {
115+
entryAddedEvent.getAddedEntry().getEventPublisher().onEvent(event -> log.info(event.toString()));
116+
}
117+
118+
@Override
119+
public void onEntryRemovedEvent(@NotNull EntryRemovedEvent<CircuitBreaker> entryRemoveEvent) {
120+
//no needed logs
121+
}
122+
123+
@Override
124+
public void onEntryReplacedEvent(@NotNull EntryReplacedEvent<CircuitBreaker> entryReplacedEvent) {
125+
//no needed logs
126+
}
127+
};
128+
}
129+
}
130+
```
131+
132+
## Applicaion live properties:
133+
```bash
134+
"resilience4j.circuitbreaker.instances.postRegisterCustomer.ignoreExceptions": "org.springframework.web.client.HttpClientErrorException"
135+
"resilience4j.circuitbreaker.instances.postRegisterCustomer.slowCallDurationThreshold": "5000"
136+
"resilience4j.circuitbreaker.instances.postRegisterCustomer.wait-duration-in-open-state": "10000"```
137+
```
138+
139+
## Learn more:
140+
## https://resilience4j.readme.io/docs/circuitbreaker
141+
## https://medium.com/bliblidotcom-techblog/resilience4j-circuit-breaker-implementation-on-spring-boot-9f8d195a49e0
142+
143+
### **Retry Pattern**
144+
145+
In a Microservices Architecture, you may encounter problems such as:
146+
- Component Failure - during maintenance windows
147+
- Component Overload - threshold limiting the number of requests to a component (throttling)
148+
- Network Failure - Application not available for short stretches of time
149+
150+
All of the above problems occur for short periods and often do not require raising any errors. Instead, client in these scenarios should retry the request, such a retry is called implementing Retry Pattern. The whole idea of the pattern is about manipulating the duration to pause before retrying a failed request.
151+
152+
## Retry Backoff
153+
By implementing Retry Backoff we are tackling a common computer science problem - Thundering Herd problem. If one of the service is down, and we have hundreds or thousands concurrent requests to it, each of which retries immediately, it is highly likely that we will end up with the service going down again. To resolve this issue, we have to implement Retry Backoff.
154+
After each retry, the amount of time between requests should increase. It might look like this:
155+
```
156+
retry_counter*backoff
157+
```
158+
By doing so we are having a better chance of not overloading the service.
159+
160+
Another factor that we have to take into consideration is the scope of the operation. We do not want to end up in a situation where the system has not yet recovered from the previous call, and we are already retrying again. After adding this variable to our algorithm, it might look like this:
161+
```
162+
(retry_counter*backoff)+fixed_operation_time
163+
```
164+
Additionally, we can make it even more resilient by limiting the number of retries. All of this depends on the specific business case.
165+
166+
## Example
167+
168+
In this example we use a resilience4j library, just as other examples:
169+
pom.xml
170+
```bash
171+
<dependency>
172+
<groupId>io.github.resilience4j</groupId>
173+
<artifactId>resilience4j-spring-boot2</artifactId>
174+
<version>1.7.0</version>
175+
</dependency>
176+
```
177+
178+
Method with a call what we would like to monitor and retry if needed:
179+
```bash
180+
@Retry(name = "postConfirmation")
181+
public void initiateConfirmation(@Valid ScaConfirmationDTO confirmation) {
182+
final ConfirmationPostResponse confirmationPostResponse = confirmationClient
183+
.postConfirmation(confirmationRequest(confirmation));
184+
sendPushNotification(confirmation);
185+
}
186+
```
187+
188+
## Application live properties:
189+
```bash
190+
resilience4j.retry.instances.postConfirmation.max-attempts: "3"
191+
resilience4j.retry.instances.postConfirmation.wait-duration: "1s"
192+
resilience4j.retry.instances.postConfirmation.ignoreExceptions: 'org.springframework.web.client.HttpClientErrorException'
193+
resilience4j.retry.metrics.legacy.enabled: "true"
194+
resilience4j.retry.metrics.enabled: "true"
195+
```
196+
197+
## Learn more:
198+
## https://resilience4j.readme.io/docs/retry
199+
200+
### **Summary**
201+
202+
In this article, we have discussed various microservices patterns designed to enhance scalability and resilience in Microservice Architecture. By understanding and applying these patterns, developers can build more robust, scalable and resilient services that can effectively handle the complexities of modern applications.

0 commit comments

Comments
 (0)