Computer Engineering and Applications ›› 2017, Vol. 53 ›› Issue (18): 141-148.DOI: 10.3778/j.issn.1002-8331.1612-0406

Previous Articles     Next Articles

Chinese resume information automatic extraction and recommendation algorithm

GU Nannan, FENG Jun, SUN Xia, ZHAO Yan, ZHANG Lei   

  1. School of Information Science and Technology, Northwest University, Xi’an 710127, China
  • Online:2017-09-15 Published:2017-09-29


谷楠楠,冯  筠,孙  霞,赵  妍,张  蕾   

  1. 西北大学 信息科学与技术学院,西安 710127

Abstract: In order to solve the problem of laborious and time-consuming artificial selection from mass electronic resumes, a solution to resumes automatic extraction and recommendation is proposed. Firstly, the sentences in Chinese resume are represented as vectors through word segmentation, part-of-speech tagging and other preprocessing steps, then SVM classification algorithm is used to classify the sentences into six predefined general classes, such as personal basic information, job intension, working experience and so on. Secondly, according to the lexical and grammatical features of personal basic information block, the rules are constructed by hand to extract the key information like Name, Gender, and Contact information. While the HMM model is used to extract the detailed information in complex information blocks, and puts forward rules and statistics based resume information extraction method. Finally, a Content-Based Reciprocal Recommender algorithm (CBRR) is proposed, which takes into account the preferences of both enterprise and job seekers. The experiment results show that the solution proposed in this paper can assist enterprises in recruitment, improve screening efficiency and save recruitment costs.

Key words: information extraction, recommendation, collaborative filtering, rule, statistics, resume

摘要: 为解决企业人工筛选电子简历效率低等问题,提出一种简历自动解析及推荐方案。对中文简历中的句子进行分词、词性标注等预处理,表示为特征向量,并利用SVM分类算法将所有句子划分成预定义的六个通用类别,包括个人基本信息、求职意向和工作经历等。利用个人基本信息的词法和语法特征,手工构建规则来实现姓名、性别及联系方式等关键信息抽取;对复杂的工作经历等文本用HMM模型进一步抽取详细信息,从而形成基于规则和统计相结合的简历文本信息抽取方法。考虑企业和求职者双方偏好,提出基于内容的互惠推荐算法(Content-Based Reciprocal Recommender algorithm,CBRR)。实验结果表明,整个方案能有效处理电子简历,提高简历筛选效率,辅助企业进行人才招聘。

关键词: 信息抽取, 推荐, 协同过滤, 规则, 统计, 简历