Product specification auto extract method of e-commerce websites

doi:10.3778/j.issn.1002-8331.1708-0053

Abstract

Abstract: The automatic mining of billions of product specification information in Web has important application value in many fields such as e-commerce market analysis, commodity recommendation, after-sales service and so on. But the current methods of specification extraction don’t effectively solve the balance between manual annotation workload, scalability and accuracy. This paper proposes the Title Seed Automatic Extract（TSAE） method, using unsupervised learning method, using the page title as seed, combining with statistical characteristics, natural and machine semantics, it achieves higher accuracy while reducing the workload, enhancing the scalability. The experimental results show that the TSAE method has better automatic extraction precision while providing good performance and expansibility, can support the massive data processing, has good practical value.

Key words: information extraction, automatic extraction, product specification, e-commerce

摘要： Web中数十亿的商品规格信息的自动挖掘，对电子商务领域的市场分析、商品推荐、售后服务等诸多领域有重要的应用价值。但目前的商品规格信息抽取方法尚未有效解决人工标注工作量、扩展性和准确率之间的平衡问题，提出一种商品网页规格信息自动抽取方法TSAE（Title Seed Automatic Extract），采用无监督的学习方法，以网页标题为种子，结合统计特征、自然语义和机器语义，在减少工作量、提升扩展性的同时，达到了较高的准确率。实验表明，TSAE方法在提供更好的自动化抽取效果的同时，具备良好的性能和扩展性，能够支撑海量数据处理，具有良好的实用价值。

关键词: 信息抽取, 自动抽取, 商品规格信息, 电子商务

ZHAO Xiaoyong, WANG Lei. Product specification auto extract method of e-commerce websites[J]. Computer Engineering and Applications, 2017, 53(24): 168-171.

赵晓永，王磊. 电商网页中商品规格信息自动抽取方法研究[J]. 计算机工程与应用, 2017, 53(24): 168-171.

[1]	WEI Hao, ZHOU Ai, ZHANG Yijia, CHEN Fei, QU Wen, LU Mingyu. Review of Deep Learning-Based Biomedical Entity Relation Extraction Research [J]. Computer Engineering and Applications, 2021, 57(21): 14-23.
[2]	WU Cheng, WANG Chaokun, WANG Muxian. Entity Attributes Extraction Based on Text Simplification [J]. Computer Engineering and Applications, 2020, 56(21): 115-122.
[3]	HE Xijun, MA Shan, WU Yuying, JIANG Guorui. E-Commerce Product Sales Forecast with Multi-Dimensional Index Integration Under Small Sample [J]. Computer Engineering and Applications, 2019, 55(15): 177-184.
[4]	WANG Huadong, LI Zhaoling. Study on Picking Strategy of Cross-Border E-Commerce Bonded Warehouse [J]. Computer Engineering and Applications, 2019, 55(12): 259-264.
[5]	HUANG Cheng1，2, LIU Jiayong1, LIU Liang1, HE Xiang1, TANG Dianhua2. Research on extraction model of malicious domain corpus based on context semantics [J]. Computer Engineering and Applications, 2018, 54(9): 101-108.
[6]	ZHANG Qinghua, LV Xiaodan. Research on vehicle routing problem with return and replacement in e-commerce environment and its solution to ant colony algorithm [J]. Computer Engineering and Applications, 2018, 54(22): 239-245.
[7]	WANG Haiyong, FENG Zhaoxu, YANG Haibo, ZHANG Jindong. Research on text extraction algorithm based on structure similarity page clustering [J]. Computer Engineering and Applications, 2018, 54(11): 122-127.
[8]	DU Boyuan1, WANG Meiqing1, CHEN Changfu2, CHEN Fei1. Tags extraction for Web information based on structure consistency and feature learning [J]. Computer Engineering and Applications, 2017, 53(7): 74-78.
[9]	WANG Mingjia1, HAN Jingti1，2. Collaborative filtering algorithm based on item attribute preference [J]. Computer Engineering and Applications, 2017, 53(6): 106-110.
[10]	GU Nannan, FENG Jun, SUN Xia, ZHAO Yan, ZHANG Lei. Chinese resume information automatic extraction and recommendation algorithm [J]. Computer Engineering and Applications, 2017, 53(18): 141-148.
[11]	QIAO Peili, WANG Na. Considering reverse logistics and third party distribution of location routing problem research [J]. Computer Engineering and Applications, 2017, 53(10): 55-60.
[12]	ZHOU Qianming1，2, ZHU Xinjuan1, HU Ximin1. Apparel reasoning deformation and simulation method for 2D virtual try-on [J]. Computer Engineering and Applications, 2016, 52(8): 158-162.
[13]	SUN Hongmin, JIANG Nannan, LI Xiang. Research on biological information mining model based on document set [J]. Computer Engineering and Applications, 2016, 52(24): 102-106.
[14]	CAI Zhiwen, LIN Jianzong. Value-based trust evaluation model for O2O E-commerce [J]. Computer Engineering and Applications, 2015, 51(7): 106-111.
[15]	YI Zheng, XU Wuping, XU Aiping. Discovery method of webpage subject area based on structural analysis [J]. Computer Engineering and Applications, 2015, 51(6): 227-230.

Product specification auto extract method of e-commerce websites

电商网页中商品规格信息自动抽取方法研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics