计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (12): 161-165.

• 数据库、信号与信息处理 • 上一篇    下一篇

基于遗传算法的频繁项挖掘算法

张 军   

  1. 四川大学 计算机学院 图形图像研究所,成都 610064
  • 收稿日期:2007-08-10 修回日期:2007-11-19 出版日期:2008-04-21 发布日期:2008-04-21
  • 通讯作者: 张 军

Method of mining frequent item based on genetic algorithm

ZHANG Jun   

  1. Institute of Image & Graphic,School of Computer Science,Sichuan University,Chengdu 610064,China
  • Received:2007-08-10 Revised:2007-11-19 Online:2008-04-21 Published:2008-04-21
  • Contact: ZHANG Jun

摘要: 从数学规划的角度重新表述了单维布尔型频繁项挖掘问题,利用新定义的加法和数乘及范数运算将其归结为一个非线性0-1规划问题,并利用遗传算法进行求解。在分析频繁项挖掘问题困难原因的基础上,提出了利用原数据库记录确定初始种群的方法,并在IBM公布的ticeval2000数据库上进行了数值实验。实际计算结果表明,该方法一般在几代内即可找到一批长频繁模式。

关键词: 频繁项, 数据挖掘, 非线性规划, 遗传算法

Abstract: We describe the problem of frequent item set mining on Boolean variable database as a non-linear optimization problem,which contains the new kind of addition,multiplication and norm.The optimization problem can be summed up to a 0-1 programming,which can be solved by Genetic Algorithm.Based on the analysis in reason of difficulty to problem of frequent item set mining,we present a method to decide the initial population by taking part of the records of original database.We report experimental results showing the implementation of this algorithm with IBM database.The results show that the algorithm will find a passel of frequent items within a few generations.

Key words: frequent item, data mining, nonlinear programming, genetic algorithm