计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (23): 131-138.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

余弦距离下保护型迁移学习聚类算法

张焱凯1,包  芳2,王士同1   

  1. 1.江南大学 数字媒体学院,江苏 无锡 214122
    2.江阴职业技术学院,江苏 无锡 214400
  • 出版日期:2015-12-01 发布日期:2015-12-14

 Protection-type transfer learning clustering algorithm with cosine distance metric

ZHANG Yankai1, BAO Fang2, WANG Shitong1   

  1. 1.School of Digital Media, Jiangnan University, Wuxi, Jiangsu 214122, China
    2.Jiangyin Vocational and Technological Institute, Wuxi, Jiangsu 214400, China
  • Online:2015-12-01 Published:2015-12-14

摘要: 以往研究者都从公式的合理性出发研究迁移学习和传统机器学习,但他们忽视了对问题的整体性考虑,致使在具体应用到文本分类问题时,无法实现彻底的分类。通过研究文本分类的整个过程,在k-均值算法中使用余弦距离,显著提高了实验结果;提出保护型迭代思想,同时弃用传统的词特征空间,采用隐空间作为特征向量空间,实施归一化约束。以CCI算法为例,结合提出的改进思想,产生改进算法PCCI,在降低计算复杂度的同时显著提高迁移学习的分类正确率。通过在数据集20-NewsGroups和Reuters-21578上测试并与现有其他迁移学习算法进行比较,证明了该改进算法的优越性。

关键词: 迁移学习, 欧式距离, 余弦距离, 保护型, 归一化约束, 过维数

Abstract: Former researchers commonly study transfer learning algorithms and traditional machine learning from the point of the rationality of formulas, while neglecting the integrality of the problem. As a result, their algorithms are usually unable to thoroughly practice classification when they are applied to specific text classification problem. Via observing the whole process of text classification, it uses cosine distance in k-mean method and gets obviously better results. It proposes protection-type iteration idea. It abandons traditional word feature space and chooses hidden space as the feature vector space and implements normalization constraints. Taking CCI algorithm as an example, this idea is used to create an improved algorithm which is nominated PCCI. This algorithm can prominently raise the classification accuracy of transfer learning, meanwhile reducing the computing complexity. It proves the superiority of the improved algorithm by comparing with other former transfer learning cases through program testing on the database of 20-NewsGroups and Reuters-21578.

Key words: transfer learning, Euclidean distance, cosine distance, protection-type, normalization constraints, over dimension