计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (15): 74-79.DOI: 10.3778/j.issn.1002-8331.1904-0156

• 大数据与云计算 • 上一篇    下一篇

基于启发式社团发现模型的创新态势研判算法

易成岐,郭鑫,童楠楠,窦悦,陈东,王建冬   

  1. 1.国家信息中心 大数据发展部,北京 100045
    2.北京大学 信息管理系,北京 100871
    3.中国人民大学 信息资源管理学院,北京 100872
  • 出版日期:2020-08-01 发布日期:2020-07-30

Innovation Situation Analysis Algorithm Based on Heuristic Model of Community Detection

YI Chengqi, GUO Xin, TONG Nannan, DOU Yue, CHEN Dong, WANG Jiandong   

  1. 1.Department of Big Data Development, State Information Center, Beijing 100045, China
    2.Department of Information Management, Peking University, Beijing 100871, China
    3.School of Information Resource Management, Renmin University of China, Beijing 100872, China
  • Online:2020-08-01 Published:2020-07-30

摘要:

专利网络是复杂网络领域重要的组成部分,研究专利网络对理解和把握技术创新方向具有重要指导作用。利用专利文本数据构建无向加权专利网络图,并基于启发式社团发现模型提出一种创新态势研判算法。为了缓解专利标题和摘要短文本引发的文本向量稀疏问题,采用非监督的稀疏向量稠密化方法;为了解决专利网络构建过程中的相似度阈值自动化选择问题,以实验驱动的方法对比相似度阈值与专利网络常用统计指标的变化关系,最终选用平均聚类系数这一指标实现了最优相似度阈值的自动化判定。以我国数字中国及大数据领域真实发明专利数据为驱动,验证了方法的有效性并分析了数字中国及大数据领域创新态势。

关键词: 社团发现, 复杂网络, 专利网络, 创新态势, 启发式算法

Abstract:

The patent network, an important branch of the complex network, plays a significant guiding role in understanding and grasping the way of technological innovation. Patent text data is used to construct an undirected weighted patent network graph, an innovative situation analysis algorithm based on heuristic model of community detection is proposed. In order to alleviate the text vector sparse problem caused by patent title and abstract, an unsupervised sparse vector densification method is adopted. For automatic selection of similarity threshold in the process of patent network construction, an experimental driven method is used to compare the changing relationship between similarity threshold and common statistical indexes of patent network. The index of average clustering coefficient is selected to realize the automatic determination of the optimal similarity threshold. By using patent data in the field of digital China and big data, the validity of the method is verified and the innovation trends in the above field are analyzed.

Key words: community detection, complex network, patent network, innovation situation, heuristic algorithm