Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (12): 169-174.DOI: 10.3778/j.issn.1002-8331.1904-0072

Previous Articles     Next Articles

Multi-attribute Filtering Deep Feature Synthesis Algorithm

WANG Like, CUI Xiaoli, ZHANG Lige   

  1. 1.Chengdu Institute of Computer Applications, Chinese Academy of Sciences, Chengdu 610041, China
    2.University of Chinese Academy of Sciences, Beijing 100049, China
    3.Sichuan Rainbow Consulting & Software Co., Ltd., Chengdu 610041, China
  • Online:2020-06-15 Published:2020-06-09



  1. 1.中国科学院 成都计算机应用研究所,成都 610041
    2.中国科学院大学,北京 100049
    3.四川虹信软件股份有限公司,成都 610041


Traditional feature engineering completely relies on manual work to extract features from relational entities, which is tedious, time-consuming and error-prone. Deep feature synthesis algorithm can synthesize a large number of features for structured data and realize automatic feature engineering of relational entities. Aiming at the problem that the synthetic features in deep feature synthesis are difficult to screen and severely redundant, an attribute filtering algorithm based on Kullback-Leibler(KL)?divergence and Hellinger distance is proposed. Through mapping and connecting entities and tags, the importance of attributes in entities is measured, multiple filtering of attributes in entities is conducted, and the attributes with low importance in entities are rejected to participate in the deep feature synthesis algorithm, and the optimized feature synthesis result is obtained. Three different types of open data sets are selected for experimental verification on different machine learning algorithms. The results show that the improved method can significantly reduce the running time of the algorithm and the size of the synthesized data, and effectively improve the quality of the synthesized features and prediction accuracy.

Key words: deep feature synthesis, multiple attribute filtering, Kullback-Leibler(KL) divergence, Hellinger distance



关键词: 深度特征合成, 多重属性过滤, KL散度, Hellinger距离