Computer Engineering and Applications ›› 2016, Vol. 52 ›› Issue (2): 96-104.

Previous Articles     Next Articles

Research on improved network data-leakage detection scheme

ZHAO Genlin1, LI Hua2   

  1. 1.School of Computer Science and Engineering, Changshu Institute of Technology, Changshu, Jiangsu 215500, China
    2.School of Computer and Communication, Hunan University, Changsha 410082, China
  • Online:2016-01-15 Published:2016-01-28

一种改进的网络数据泄漏检测方案研究

赵根林1,李  华2   

  1. 1.常熟理工学院 计算机科学与工程学院,江苏 常熟 215500
    2.湖南大学 计算机与通信学院,长沙 410082

Abstract: Preventing flow of confidential data out of a network is a fundamental problem faced by network operators. This problem gets even more complex in the context of cloud computing. The existing data-leakage prevention solutions are based on generic search for keywords in outgoing data, and hence severely lack the ability to control data flow at a fine granularity with low false positives. In order to solve this problem, in this paper, a data-leakage prevention architecture based on the white-listing is designed, which uses a white-listing for providing the strong security of data transmission, on this basis, a data leakage detection algorithm by combining document fingerprinting with Bloom filters is proposed. The optimal locations for checking are computed by using dynamic programming to minimize the memory overhead and enable high-speed implementation. The simulation results show the algorithm for checking the fingerprints on the-fly scales to a large amount of documents at very low cost. For example, for one TB of documents, the solution only requires 340 MB memory to achieve worst case expected detection lag (i.e.leakage length) of 1000 bytes.

Key words: data-leakage, cloud computing, white-listing, false probability, fingerprint checks, Bloom ?lters

摘要: 防止机密数据流出网络是网络运营商面临的一个重要问题。随着云计算技术的发展,这一问题显得更加复杂。当前的数据防泄漏方案主要依赖在外传数据中进行关键词通用搜索,导致数据流控制不够精细,虚警率较高。鉴于此,设计了一种基于白名单的数据防泄漏(DLP)架构,在此基础上,提出了一种基于文件指纹和Bloom滤波器的数据泄露检测算法。该算法通过使用动态规划来计算最优检测位置,最大限度地降低了内存开销,并支持高速部署。仿真实验结果表明,该算法可以用非常低的代价,实现大量数据的在线指纹检测。例如,对1 TB的文件,该解决方案只需340 MB内存就可实现1 000 Byte的最差检测延迟期望(泄露的长度)。

关键词: 数据泄露, 云计算, 白名单, 虚警率, 指纹检测, Bloom滤波器