In this article, we will see one of the Microservice Design Patterns: Bulkhead Pattern. What problem does this pattern solve? What are its pros-cons? When should we use it? Why is this pattern named "Bulkhead"? We are going to answer these questions.
When working with a distributed architecture, if even a tiny percent chance that a service slowness, application unavailability, or network problem could occur, it absolutely will occur. Our job is to make our services as resilient as possible.
Let's consider that we are creating services for a cargo package operations team. In operation, people have different kinds of tasks. For example, one worker enters new packages into the system, and the other worker gets the accepted packages and puts them in a bag via mobile terminal devices.
In this system, we have a Mobile BFF for mobile devices, a service for creating packages called Package Service, and a service for transferring deliveries into bags called Transfer Service. When something goes wrong in Package Service, it could be anything, like a network issue, our Package Service starts giving timeouts. So all the threads in Mobile BFF will wait for Package calls, and the other operation worker can not transfer packages to the bags despite that Transfer Service is working fine.
As you can see in the gif above, when everything is going fine, suddenly Mobile BFF starts to take slow responses from Package Service. After that, Package Service calls use all the threads and lefts nothing to Transfer Service calls.
There is a pattern to solve this problem called the Bulkhead Pattern. But, first, let's talk about what bulkhead means. A ship is divided into numerous tiny sections using bulkheads. In case of a flood, bulkheads prevent the entire ship from sinking. You can check out the wiki for more details if you want to. Here is a blueprint of a boat with a bulkhead:
That's what the Bulkhead Pattern does: isolate the application's resource so one component wouldn't cause a cascading failure. We saw this kind of vulnerability in our Mobile BFF project, so let's see what would happen if we used this pattern.
As you can see, we have thread capacity for each service now. So when Package Service goes down or gives timeout, Transfer Service runs unaffectedly. Like a bulkhead, Package Service floods but the rest of the system runs smoothly.
Pros and Cons
This approach has some advantages and disadvantages, like other patterns. First, let's talk about benefits:
- Some clients can suddenly send high throughput requests(request overload) and use all threads for themselves. Bulkhead Pattern can solve this issue by assigning thread capacity to the services.
- If you have priorities between clients, you can configure high-throughput for critical clients.
- When one of the services goes down, you can prevent cascading failure by isolating services.
There are significant benefits to using this pattern, but there are disadvantages of it as well:
- To assign threads to services, you need to know business and throughputs well, which adds complexity.
- Service resource allocations are not dynamic. When a service has low throughput, its resources will idle. Other services that need resources can't use this service's resources.
When to use
Not every pattern is helpful for all situations. You can use this pattern to
- Isolate critical clients from others.
- Prevent cascading failures in case of service unavailability or slowness.
- Isolate service resources, so some parts of the application can still be available when a service goes down.
I've shown you what Bulkhead Pattern is in this article, its advantages and disadvantages, and when to use it. If you want to, you can check out the implementations of Polly and Resilience4j libraries.
- Cover: middleware.io/blog/microservices-architecture
- Bulkhead Image: sailprana.com/img/watertight.jpg
Thank you for reading. May the force be with you!