Computer Engineering and Applications ›› 2018, Vol. 54 ›› Issue (20): 115-121.DOI: 10.3778/j.issn.1002-8331.1705-0347

Previous Articles     Next Articles

Positive region computation of neighborhood rough set based on category of samples

PENG Xiaoran1,2, LIU Zunren2, JI Jun2   

  1. 1.College of Data Science and Software Engineering, Qingdao University, Qingdao, Shandong 266071, China
    2.College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
  • Online:2018-10-15 Published:2018-10-19

基于样本类别的邻域粗糙集正域计算

彭潇然1,2,刘遵仁2,纪  俊2   

  1. 1.青岛大学 数据科学与软件工程学院,山东 青岛 266071
    2.青岛大学 计算机科学技术学院,山东 青岛 266071

Abstract: For an attribute reduction algorithm based on the neighborhood rough set, the positive region calculation is the necessary basis of its efficient performance and the uppermost part of its time cost. And the speed of the calculation is mainly determined by measure times between samples. In the condition of ensuring the correctness of the calculation, the less the measure times are, the faster the calculation is. In existing positive region calculations, there are usually large measure times between samples that have the same category. Aimed at this case, this paper firstly proves that the measure between samples that have the same category is meaningless to the positive region calculation in neighborhood rough set. Then according to the proof, a positive region calculation based on category of samples is proposed. Compared with an existing positive region calculation, the experimental result shows that this proposed calculation is effective and faster. And this calculation is more suitable for data sets with fewer categories of samples.

Key words: rough set, neighborhood rough set, positive region computation, attribute reduction, category

摘要: 对基于邻域粗糙集的属性约简算法而言,正域计算是保证其有效性的重要依据,也是影响其时间开销的最主要部分。正域计算的速度主要由样本间度量计算的次数决定。在确保正确性的条件下,样本间度量计算的次数越少,则正域计算越快。在现有的正域计算中,通常存在着大量同类别样本间的度量计算。针对这个现象,首先证明在邻域粗糙集的正域计算中,同类别样本间的度量计算对正域计算是无贡献的,然后据此提出了基于样本类别的正域计算。和现有的正域计算相比,实验结果表明,该正域计算有效且更快速。而且,该正域计算更适用于样本类别数较少的数据集。

关键词: 粗糙集, 邻域粗糙集, 正域计算, 属性约简, 样本类别