电商网页中商品规格信息自动抽取方法研究

doi:10.3778/j.issn.1002-8331.1708-0053

计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (24): 168-171.DOI: 10.3778/j.issn.1002-8331.1708-0053

电商网页中商品规格信息自动抽取方法研究

赵晓永，王磊

北京信息科技大学信息管理学院，北京 100129

出版日期:2017-12-15 发布日期:2018-01-09

Product specification auto extract method of e-commerce websites

ZHAO Xiaoyong, WANG Lei

School of Information and Management, Beijing Information Science & Technology University, Beijing 100129, China

Online:2017-12-15 Published:2018-01-09

摘要/Abstract

摘要： Web中数十亿的商品规格信息的自动挖掘，对电子商务领域的市场分析、商品推荐、售后服务等诸多领域有重要的应用价值。但目前的商品规格信息抽取方法尚未有效解决人工标注工作量、扩展性和准确率之间的平衡问题，提出一种商品网页规格信息自动抽取方法TSAE（Title Seed Automatic Extract），采用无监督的学习方法，以网页标题为种子，结合统计特征、自然语义和机器语义，在减少工作量、提升扩展性的同时，达到了较高的准确率。实验表明，TSAE方法在提供更好的自动化抽取效果的同时，具备良好的性能和扩展性，能够支撑海量数据处理，具有良好的实用价值。

关键词: 信息抽取, 自动抽取, 商品规格信息, 电子商务

Abstract: The automatic mining of billions of product specification information in Web has important application value in many fields such as e-commerce market analysis, commodity recommendation, after-sales service and so on. But the current methods of specification extraction don’t effectively solve the balance between manual annotation workload, scalability and accuracy. This paper proposes the Title Seed Automatic Extract（TSAE） method, using unsupervised learning method, using the page title as seed, combining with statistical characteristics, natural and machine semantics, it achieves higher accuracy while reducing the workload, enhancing the scalability. The experimental results show that the TSAE method has better automatic extraction precision while providing good performance and expansibility, can support the massive data processing, has good practical value.

Key words: information extraction, automatic extraction, product specification, e-commerce

赵晓永，王磊. 电商网页中商品规格信息自动抽取方法研究[J]. 计算机工程与应用, 2017, 53(24): 168-171.

ZHAO Xiaoyong, WANG Lei. Product specification auto extract method of e-commerce websites[J]. Computer Engineering and Applications, 2017, 53(24): 168-171.

[1]	隗昊，周爱，张益嘉，陈飞，屈雯，鲁明羽. 深度学习生物医学实体关系抽取研究综述[J]. 计算机工程与应用, 2021, 57(21): 14-23.
[2]	吴呈，王朝坤，王沐贤. 基于文本化简的实体属性抽取方法[J]. 计算机工程与应用, 2020, 56(21): 115-122.
[3]	张庆华，吕小丹. 电商退换货车辆路径问题及蚁群算法研究[J]. 计算机工程与应用, 2018, 54(22): 239-245.
[4]	王明佳1，韩景倜1，2. 基于用户对项目属性偏好的协同过滤算法[J]. 计算机工程与应用, 2017, 53(6): 106-110.
[5]	谷楠楠，冯筠，孙霞，赵妍，张蕾. 中文简历自动解析及推荐算法[J]. 计算机工程与应用, 2017, 53(18): 141-148.
[6]	乔佩利，王娜. 考虑逆向物流第三方配送的选址路径问题研究[J]. 计算机工程与应用, 2017, 53(10): 55-60.
[7]	冯钦林，杨志豪，林鸿飞. 疾病-病症和病症-治疗物质的关系抽取研究[J]. 计算机工程与应用, 2017, 53(10): 251-257.
[8]	周千明1，2，朱欣娟1，胡西民1. 面向2D虚拟试穿的服装推理变形仿真方法[J]. 计算机工程与应用, 2016, 52(8): 158-162.
[9]	孙红敏，姜楠楠，李想. 基于文档集的生物信息挖掘模型研究[J]. 计算机工程与应用, 2016, 52(24): 102-106.
[10]	蔡志文，林建宗. 面向价值的O2O电子商务信任预测模型[J]. 计算机工程与应用, 2015, 51(7): 106-111.
[11]	伊政，徐武平，徐爱萍. 一种基于结构分析的网页主题区域发现方法[J]. 计算机工程与应用, 2015, 51(6): 227-230.
[12]	黄彦姣，吴秦，梁久祯. 基于增强约束条件随机场的Web对象信息抽取[J]. 计算机工程与应用, 2015, 51(23): 143-148.
[13]	杨浩雄1，2，李金丹1，张浩1，2. 电商配送中的车辆调度问题优化研究[J]. 计算机工程与应用, 2015, 51(15): 32-37.
[14]	肖茵茵1，2，苏开乐2，3. 电子商务支付协议认证性的SVO逻辑验证[J]. 计算机工程与应用, 2014, 50(8): 6-10.
[15]	张菲菲1，李宗海2，周晓辉1，李晓戈1,2. 基于层次聚类的跨文本中文人名消歧研究[J]. 计算机工程与应用, 2014, 50(6): 106-111.

电商网页中商品规格信息自动抽取方法研究

Product specification auto extract method of e-commerce websites

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics