计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (16): 326-332.DOI: 10.3778/j.issn.1002-8331.2012-0553

• 工程与应用 • 上一篇    

基于深度学习的微服务故障检测研究

庄卫金,张鸿   

  1. 中国电力科学研究院有限公司 南京分院,南京 210003
  • 出版日期:2022-08-15 发布日期:2022-08-15

Research on Micro-Service Fault Detection Based on Deep Learning

ZHUANG Weijin, ZHANG Hong   

  1. China Electric Power Research Institute(Nanjing), Nanjing 210003, China
  • Online:2022-08-15 Published:2022-08-15

摘要: 微服务架构是一种新的架构模式,它致力于为用户提供更加可靠、可维护性和可扩展性更好的软件设计服务。它提倡将单一应用程序划分为若干个独立的服务,每个服务功能各异并通过频繁的信息交互来为用户提供最终价值。虽然微服务架构有着诸多优点,但它也为系统的故障检测带来了更多的挑战:(1)由于微服务架构中存在多个服务,定位故障点变得更加困难;(2)一个服务故障可能会引起连锁反应,造成整个系统的崩溃。因此,如何准确地诊断出微服务架构中的故障,并精确地定位出故障发生的位置是提高微服务架构服务质量(quality of service,QoS)的关键。将深度学习方法引入到微服务架构的故障检测研究中,提出了一个基于门控循环单元(gated recurrent unit,GRU)的故障检测模型MS-GRU。该方法的核心在于它能够从以往的应用数据中分析、学习到导致故障发生的模式信息,并将这些信息用于未来的故障诊断和预测中,从而显著改善微服务架构的服务质量。最后进行了广泛的实验来评估所提出方法的性能,实验结果证明了该方法的有效性和优越性。

关键词: 微服务架构, 故障检测, 门控循环单元(GRU), 注意力机制, 服务质量

Abstract: Micro-service architecture is a new kind of architecture model, which aims to provide users with more reliable, maintainable and scalable software design services. It encourages to divide a large application into several independent micro-services, each of which has different functions, providing users with better services via frequent information interaction. Although micro-service architecture has a lot of advantages, it also brings more challenges to system fault detection:(1)Since micro-service architecture usually consists of multiple micro-services, it is more difficult to locate the fault point;(2)A failure of one micro-service may cause the whole system to crash. Therefore, how to accurately detect and locate the faults in micro-service architecture has become one of the key problems to improve the QoS(quality of service) of micro-service architecture. This paper introduces the deep learning technique into the research of fault detection of micro-service architecture, and proposes a novel fault detection model, termed as MS-GRU, which is based on GRU(gated recurrent unit). The core idea of MS-GRU is that it can analyze and learn the pattern information that leads to the failure of systems from previous application data, and leverage the learned pattern information to detect and predict potential faults in the future, thus significantly improving the QoS of micro-service architecture. The paper conducts extensive experiments to evaluate the performance of the proposed method. The experimental results demonstrate the effectiveness and superiority of the proposed method.

Key words: micro-service architecture, fault detection, gated recurrent unit(GRU), attention mechanism, service quality