计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (6): 149-156.DOI: 10.3778/j.issn.1002-8331.2010-0338

• 模式识别与人工智能 • 上一篇    下一篇

基于域名系统知识图谱的CDN域名识别技术

闫志豪,刘京菊,郭徽,郭兵阳   

  1. 1.国防科技大学 电子对抗学院,合肥 230037
    2.网络空间安全态势感知与评估安徽省重点实验室,合肥 230037
  • 出版日期:2022-03-15 发布日期:2022-03-15

CDN Domain Recognition Method Based on DNS Knowledge Graph

YAN Zhihao, LIU Jingju, GUO Hui, GUO Bingyang   

  1. 1.College of Electronic Engineering, National University of Defense Technology, Hefei 230037, China
    2.Anhui Province Key Laboratory of Cyberspace Security Situation Awareness and Evaluation, Hefei 230037, China
  • Online:2022-03-15 Published:2022-03-15

摘要: 内容分发网络(content delivery network,CDN)是互联网上的重要基础设施,目前识别CDN域名的方法主要利用域名字符特征、HTTP关键字和DNS记录等,识别范围有限。针对大规模识别CDN域名的问题,提出了基于域名系统知识图谱的CDN域名识别技术。根据域名系统的特征进行本体建模、数据获取、知识图谱构建,通过分析域名系统相关数据获取CDN服务特征。将CDN域名作为知识图谱域名节点的属性,定义推理规则,通过知识图谱内包含的实体、关系和属性进行关联分析,识别CDN域名。基于该方法对Alexa排名前100万域名及其部分子域名进行建模识别,构建了超百万节点和关系的域名系统知识图谱。实验结果表明,该方法在不通过手工识别构建样本集的情况下可以达到88%的分类精度和86%的F1指数。

关键词: 域名系统, 知识图谱, 本体构建, CDN识别, 知识推理

Abstract: Content delivery network(CDN) has become a significant infrastructure on the Internet. The information such as domain name character, HTTP keyword and DNS records are used in the current CDN recognition methods with limited identification range. Focused on the problem of large-scale recognition of the CDN domain, a CDN domain recognition technology based on constructing the knowledge graph of the domain name system is proposed. The ontology modeling, data acquisition, and construction of knowledge graph based on the characteristics of the domain name system and CDN service characteristics are obtained by analyzing the relevant data of the domain name system. Considering CDN as an attribute of the domain name node, attribute inference rules are defined. CDN domain identifies with the association analysis through the entities, relationships, and attributes contained in the knowledge graph. Based on the top 1 million domain names of Alexa to construct the DNS knowledge graph and the experimental results show that the method can achieve 88% precision and 86% F1 score without constructing data sets through manual identification.

Key words: domain name system, knowledge graph, ontology construction, CDN recognition, knowledge inference