计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (11): 148-151.

• 数据库、信号与信息处理 • 上一篇    下一篇

基于相邻频繁模式段的闭合序列模式挖掘算法

王 淼,尚学群,薛 贺   

  1. 西北工业大学 计算机学院,西安 710072
  • 收稿日期:2007-07-31 修回日期:2007-10-09 出版日期:2008-04-11 发布日期:2008-04-11
  • 通讯作者: 王 淼

Joined pattern segment-based closed sequential pattern mining algorithm

WANG Miao,SHANG Xue-qun,XUE He   

  1. School of Computer,Northwestern Polytechnical University,Xi’an 710072,China
  • Received:2007-07-31 Revised:2007-10-09 Online:2008-04-11 Published:2008-04-11
  • Contact: WANG Miao

摘要: 直接对生物序列进行频繁模式挖掘会产生很多冗余模式,闭合模式更能表达出序列的功能和结构。根据生物序列的特点,提出了基于相邻闭合频繁模式段的模式挖掘算法-JCPS。首先产生闭合相邻频繁模式段,然后对这些闭合频繁模式段进行组合,同时进行闭合检测,产生新的闭合频繁模式。通过对真实的蛋白质序列家族库的处理,证明该算法能有效处理生物序列数据。

关键词: 闭合模式, 相邻频繁模式段, 模式组合

Abstract: Traditional algorithms for sequential pattern mining may produce lots of redundant patterns when dealing with biological datasets.Closed pattern is preferable to express the function and structure of biology sequence.Biology sequence has its own characters.Based on these characters,the author develop Joined Closed Pattern Segment approach,JCPS,for mining closed patterns of biological sequences.First,the joined closed frequent pattern segments are produced.Then,longer closed frequent patterns can be obtained by combining the above segments,at the same time deleting the unclosed patterns.Through dealing with the real protein family database,it is proved that the algorithm can deal with biology sequence data efficiently.

Key words: closed pattern, joined frequent pattern segment, pattern combination