What is the Key Components for Crafting a Resilient Service Architecture?

Admir Mujkic
5 min readMar 5, 2024

--

To understand microservices, we should first know about their distributed architecture. This means all these components or services function as separate applications accessible remotely through a remote access protocol.

In distributed architectures, managing the availability and responsiveness of remote processes presents a challenge. Service availability and service responsiveness are related but distinct concepts.

Service availability refers to the capability of the service consumer connecting and sending a request. Response time on the other hand measures how quickly it responds to a request after being contacted. When service availability is impaired, then the consumer will be promptly notified (usually within milliseconds) that they can’t connect or communicate with the service.

Figure1: Service Availability vs. Responsiveness

As a service consumer, you don’t have any freedom to relay an error message to the client or try to connect with the service numerous times before leaving the process hanging. You can set up a timeout period or wait indefinitely if the service is accessed but doesn’t provide any response.

While timeout values for service responsiveness may seem practical, they can lead to the timeout antipattern, which I won’t recommend.

Utilizing Time Limits

Setting the appropriate timeout value in service communication is crucial. A very short timeout might lead to missed responses, while an excessively long timeout can delay failure detection.

Let’s try to consider next scenario. Imagine you want to buy 1000 shares of Microsoft stock (MSFT) through a service request. If the timeout occurs just as the request is executed, you might miss the confirmation, which can lead to complications in identifying duplicates and successful trades.

So, what’s the right approach to determine the timeout value? There are a couple of techniques:

  1. Calculating the database timeout value within the service and using it as the basis for determining the service timeout.
  2. The popular approach involves calculating the maximum time under load and doubling it. This provides a buffer for situations where responses might take longer than usual.

For instance, if, on average, the service takes 2 seconds to respond to place a trade but under load, the maximum response time is 5 seconds, then using the doubling technique, the service consumer’s timeout value would be set to 10 seconds.

By employing such techniques, we aim to avoid premature timeouts for successful requests and ensure that confirmation numbers can be received by the consumer. Refer to Figure 2: Circuit Breaker Pattern for an illustration of this approach.

Figure 2: Circuit Breaker Pattern

It’s common for the service consumer to check if the breaker is open or closed before implementing the pattern. You can also use an interceptor pattern to hide the circuit breaker’s position from the service consumer.

When the service becomes unresponsive, the pattern notifies the service consumer immediately, instead of waiting for the timeout to expire.

In the previous example, if a circuit breaker were used in place of a timeout value, the service consumer would get information that the trade-placement service was unresponsive in a matter of milliseconds instead of waiting 10 seconds (10,000 milliseconds).

There are several ways circuit breakers can monitor remote services. Perform a simple heartbeat check on the remote service like ping. While this is relatively simple and low-cost, it doesn’t tell the circuit breaker if the remote service is responsive or not.

To get better information about a request’s responsiveness, you can use synthetic transactions. Synthetic transactions are used by circuit breakers, where a fake transaction is sent periodically for example every 10 seconds.

A fake transaction performs all the functionality within that service, so the circuit breaker can measure responsiveness accurately. Synthetic transactions, however, can be challenging to implement since all parts of the application or system need to know about them.

Another way to monitor is real-time user monitoring, where actual production transactions are monitored. A circuit breaker enters a half-open state once a threshold is reached, where only a certain number of transactions are allowed for example 1 out of 10.

After service returns to normal, the circuit breaker is closed, allowing all transactions to go through.

Let’s see the Code

Here’s an example of how to use the circuit breaker pattern in C# with the Polly library:

using Polly
using System;

public class CircuitBreakerPenzleExample
{
private static int numFailures = 0;

static void Main(string[] args)
{
var circuitBreaker = Policy
.Handle<Exception>()
.CircuitBreaker(3, TimeSpan.FromSeconds(5),
onBreak: (ex, breakDelay) =>
{
Console.WriteLine($"Circuit breaker opened after {numFailures} failures.");
},
onReset: () =>
{
Console.WriteLine($"Circuit breaker reset.");
numFailures = 0;
});

while (true)
{
circuitBreaker.Execute(() => MakePenzleCall());
}
}

private static void MakePenzleCall()
{
if (numFailures < 3)
{
Console.WriteLine("Service call successful.");
numFailures++;
}
else
{
Console.WriteLine("Service call failed.");
throw new Exception("Service call failed.");
}
}
};

In this example, the Policy.Handle<Exception>.CircuitBreaker method is used to create a circuit breaker that handles exceptions and is configured to open the circuit after three failures and a delay of 5 seconds. When the circuit is opened, the onBreak action is executed and when the circuit is reset, the onReset action is executed.

The MakePenzleCall method simulates a service call that can fail. The CircuitBreaker.Execute method is used to execute the service call within the circuit breaker. If the service call fails, the exception is thrown and the circuit breaker counts the failure.

If the number of failures exceeds the threshold, the circuit is opened and onBreak action is executed. If the service call succeeds, the onReset action is executed, and the number of failures is reset to 0.

Wrapping Up

The Circuit Breaker Pattern is a fantastic pattern for managing service availability and responsiveness in a distributed architecture. In this article this pattern is simplified, but If you’re keen to dive deeper, I recommend checking out Michael Nygard’s excellent book “Release It!”, Martin Fowler’s circuit breaker blog post, and the Microsoft MSDN library.

Got questions or need some help? Don’t hesitate to reach out. Happy coding!

Cheers! 👋

--

--

Admir Mujkic

Admir combined engineering expertise with business acumen to make a positive impact & share knowledge. Dedicated to educating the next generation of leaders.