搜索引擎日志中“N+V+N”、“V+N+N”型短语识别

计算机工程与应用 ›› 2013, Vol. 49 ›› Issue (6): 143-147.

• 数据库、数据挖掘、机器学习 • 上一篇下一篇

搜索引擎日志中“N+V+N”、“V+N+N”型短语识别

郑丽，吕学强

北京信息科技大学中文信息处理研究中心，北京 100101

出版日期:2013-03-15 发布日期:2013-03-14

“N+V+N”、“V+N+N” structure phrase recognition in search engine query logs

ZHENG Li, LV Xueqiang

Chinese Information Processing Research Center, Beijing Information Science & Technology University, Beijing 100101, China

Online:2013-03-15 Published:2013-03-14

摘要/Abstract

摘要： 短语识别是进行短语分析的前期准备工作。针对搜索引擎日志中“N+V+N”、“V+N+N”型短语特点，采用最大熵方法，按词信息、词性信息、音节数及前位标记信息提取特征构建训练集，得到最大熵方法进行短语识别的机器学习模型。实验结果显示，利用最大熵方法对两种短语进行开放性测试，两种短语的识别F值分别达到85.78%和76.47%，取得了较好的自动识别效果，在半开放性测试中，其识别结果更佳。

关键词: 短语识别, 搜索引擎日志, &ldquo, N+V+N&rdquo, &ldquo, V+N+N&rdquo, 最大熵方法

Abstract: The phrase recognition is the period preparatory work before carrying on the phrase analysis. This paper in view of the characteristics of “N+V+N”、“V+N+N” structure phrase in search engine query logs of the corpus, uses a method of maximum entropy to get the machine learning model for phrase recognition according to the word information, the part of speech information, the number of syllable, anterior tags. Experimental results of the open tests show better performances: F_value of “N+V+N” 85.78% and F_value of “V+N+N” 76.47%. In the semi open tests, the experiment result is better.

Key words: phrase recognition, search engine logs, “N+V+N”, “V+N+N”, maximum entropy

郑丽，吕学强. 搜索引擎日志中“N+V+N”、“V+N+N”型短语识别[J]. 计算机工程与应用, 2013, 49(6): 143-147.

ZHENG Li, LV Xueqiang. “N+V+N”、“V+N+N” structure phrase recognition in search engine query logs[J]. Computer Engineering and Applications, 2013, 49(6): 143-147.

[1]	刘今越，李洋，郭志红，任志斌，刘佳斌. 面向光学检测中轨迹优化问题的遗传算法研究[J]. 计算机工程与应用, 2018, 54(4): 205-210.
[2]	方刚1，张社民2. 三元统计语言模型对基因表达载体设计的优化[J]. 计算机工程与应用, 2016, 52(15): 60-64.
[3]	肖胜笔，李燕. 具有颜色保真性的快速多尺度Retinex去雾算法[J]. 计算机工程与应用, 2015, 51(6): 176-180.
[4]	罗党，李钰雯. 基于“拆分”思想的多目标灰色局势决策[J]. 计算机工程与应用, 2015, 51(23): 203-206.
[5]	王璐璐1，孙薇薇2，袁毓林2. “把”字句的自动释义与句式变换研究[J]. 计算机工程与应用, 2015, 51(19): 129-137.
[6]	刘晨曦，褚晶辉，吕卫，王建. 基于多特征融合的图像主体显著性判断[J]. 计算机工程与应用, 2014, 50(9): 150-154.
[7]	李战明1，张永江1，2，韩大红3. 基于矢量“隶属度”的模糊控制器结构优化[J]. 计算机工程与应用, 2014, 50(21): 74-78.
[8]	张莹，贺丰收，郑世友. 基于强跟踪滤波器的交互式多模型算法[J]. 计算机工程与应用, 2013, 49(7): 132-135.
[9]	朱坤. 知识作业“动作-时间”分析法研究及应用[J]. 计算机工程与应用, 2013, 49(12): 16-20.
[10]	刘连宇，舒勤. “当前”半马尔科夫模型及自适应跟踪算法[J]. 计算机工程与应用, 2013, 49(1): 128-130.
[11]	崔春生1，苏白云2. 基于Vague值的非个性化产品推荐研究[J]. 计算机工程与应用, 2012, 48(13): 63-66.
[12]	石隽锋1，张剑妹2. POTwigStack：一种改进的XML小枝模式匹配算法[J]. 计算机工程与应用, 2012, 48(11): 123-128.
[13]	熊壮. 基于无监督学习的产品特征抽取[J]. 计算机工程与应用, 2012, 48(10): 160-163.
[14]	蒋效宇. 用于网页相关性判断的聚焦查询文摘算法研究[J]. 计算机工程与应用, 2011, 47(33): 126-128.

搜索引擎日志中“N+V+N”、“V+N+N”型短语识别

“N+V+N”、“V+N+N” structure phrase recognition in search engine query logs

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 14

编辑推荐

Metrics