计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (18): 115-120.DOI: 10.3778/j.issn.1002-8331.1604-0023

• 模式识别与人工智能 • 上一篇    下一篇

城管案件短文本特征生成与选择方法及其应用

魏  文1,杨辉华1,2,李灵巧1,2,杨  浩1,何胜韬3   

  1. 1.桂林电子科技大学 广西信息科学实验中心,广西 桂林 541004
    2.北京邮电大学 自动化学院,北京 100876
    3.桂林市智度信息科技有限公司,广西 桂林 541004
  • 出版日期:2017-09-15 发布日期:2017-09-29

Feature generation and selection method for short text of urban management cases and its application

WEI Wen1, YANG Huihua1,2, LI Lingqiao1,2, YANG Hao1, HE Shengtao3   

  1. 1.Guangxi Experiment Center of Information Science, Guilin University of Electronic Technology, Guilin, Guangxi 541004, China
    2.School of Automation, Beijing University of Posts and Telecommunications, Beijing 100876, China
    3.Guilin Intelligent Metric Information Technology Co. LTD., Guilin, Guangxi 541004, China
  • Online:2017-09-15 Published:2017-09-29

摘要: 以智慧城市管理应用系统中的案件上报短文本为对象,研究有效的特征生成和特征选择方法,实现案件快速准确地自动分类。根据案件描述短文本的特点,提出一种互邻特征组合算法,以生成描述力更强的组合特征;为进一步约减特征并优化特征空间,提出一种新的隶属度函数来为分类体系中的每个类别构建一个类别特征域,然后利用类别特征域进一步优化选择原始特征与组合特征,最终得到对分类贡献最高的特征表示集合。以南宁市青秀区“城管通”App中的案例分类为实例,验证提出的特征生成及选择方法,实验表明相对于文档频率、互信息和信息增益,提出的方法对案件分类的准确率更高,引入组合特征能显著提升分类准确率。

关键词: 城管案件, 短文本分类, 互邻特征组合, 特征选择, 类别特征域

Abstract: This paper aims to provide effective methods for feature generation and selection so as to automatically categorizing short text of urban management cases reported by smart city management application system. By analyzing the characteristics of the short text, a new adjacent feature combination algorithm is proposed to generate combined features bearing more descriptive power. To further reduce and optimize the feature space, a new membership function to construct a class feature domain for each category in the classification system is presented. Then, the category feature domain is applied to further select both original and combining features to attain optimal feature representation collection with highest significance to classification. The city management application of Qingxiu District of Nanning city is used as an example, experimental results show that the feature generation and selection method proposed in this paper has better performance in short text classification compared to the document frequency, mutual information and information gain and other methods, and the introduction of combined features can significantly improve the performance of the short text classification system.

Key words: urban management cases, short text categorization, adjacent feature combination, feature selection, class feature domain