Document Type

Thesis - Open Access

Award Date


Degree Name

Master of Science (MS)

Department / School

Electrical Engineering and Computer Science

First Advisor

Yi Liu


Microservices architecture is popular for its distributive system styles due to the independent character of each of the services in the architecture. Microservices are built to be single and each service has its running process and interconnecting with a lightweight mechanism that called application programming interface (API). The interaction through microservices needs to communicate internally. Microservices are a service that is likely to become unreachable to its consumers because, in any distributed setup, communication will fail on occasions due to the number of messages passing between services. Failures can occur when the networks are unreliable, and thus the connections can be latent which may lead to failure or slow response. This might be a problem for synchronous remote calls actively waiting for a response. If they do not use a proper timeout mechanism, they may end up waiting for an extended amount of time. Applications usually set a timeout for all remote calls to avoid hanging of the whole application due to network failure or component failure. However, this timeout needs to be set carefully to make the system or microservice application to work as required. This would prevent further problems because if a remote call is waiting too long for a reply, it can slow down the system in its entirety, and if a connection timeout is extremely fast, it may ignore a response that is sent after timeout. This thesis proposes a dynamic fault tolerance (DFTM) Model to improve the stability and resilience of the microservices architecture. The Model is designed using a two-states Circuit Breaker called Switch Circuit Breaker with Markov-Chain. In addition, a modified Circuit Breaker (three states – open, closed, and half-open) to Switch Circuit Breaker (two states – open and closed) is presented here. The Circuit Breaker uses timeout to detect fault but timeouts usage hinges on assumptions about the real-time behavior of the system and awaiting process can be deduced from the occurrence of a timeout that a failure has occurred. Therefore, DFTM model adopted Markov Chain based model to detect fault without a timeout. Then, it sends the fault directly to Switch Circuit Breaker that uses a 2-states to cover the faults. An important finding is that the DFTM model presents a solution to the problem of transient failures or faults in the interservice communication of microservices architecture. Also, it improves the performance and reliability of microservices architecture.

Library of Congress Subject Headings

Fault tolerance (Engineering)
Application program interfaces (Computer software)
Application software -- Development.
Software architecture.



Number of Pages



South Dakota State University


Rights Statement

In Copyright