基于数据关联性聚类的数据布局算法

计算机工程与应用 ›› 2014, Vol. 50 ›› Issue (3): 117-120.

• 数据库、数据挖掘、机器学习 • 上一篇下一篇

基于数据关联性聚类的数据布局算法

董微，闻育

中国计量学院计算机应用技术系，杭州 310018

出版日期:2014-02-01 发布日期:2014-01-26

Data placement algorithm based on data dependence

DONG Wei, WEN Yu

Department of Computer Applications Technology, China Jiliang University, Hangzhou 310018, China

Online:2014-02-01 Published:2014-01-26

摘要/Abstract

摘要： 现代信息系统的突出特征是基于海量数据的分布式应用集群。优化海量数据的存储布局，以提升存储资源的利用率和应用执行的速度，是一个重要研究课题。由于数据与数据之间存在关联性，只考虑负载均衡的布局算法缺乏实用性，需要进一步考虑数据与数据的关联性以提高应用执行速度。建立了数据和数据的关联矩阵，基于关联矩阵进行聚类，再将数据分配到各个数据中心中，计算执行应用时的数据迁移量，并与一致hash算法进行了比较，结果表明数据迁移量大大低于一致hash算法。

关键词: 数据布局, 聚类, 一致hash, 数据关联性

Abstract: The prominent feature of the modern information systems is the distributed applications clustering based on massive data, so optimizing the storage of mass data to improve the response time of the application service while making full use of the storage resources is an important task. Because of the dependency of the data, the data placement algorithm which simply considers the load balancing is lack of practicability, so it needs to further consider the denpendency of the data to improve the response time of the application service. So it establishes a denpendency matrix, clusters the data based on the dependency matrix, and then distributes the data to each data center. It analyses data movements of the application and compared with the consistent hashing, the results show that the data movements is greatly decreased.

Key words: data placement, clustering, consistent hash, data dependence

董微，闻育. 基于数据关联性聚类的数据布局算法[J]. 计算机工程与应用, 2014, 50(3): 117-120.

DONG Wei, WEN Yu. Data placement algorithm based on data dependence[J]. Computer Engineering and Applications, 2014, 50(3): 117-120.

[1]	兰红，黄敏. 融合KNN优化的密度峰值和FCM聚类算法[J]. 计算机工程与应用, 2021, 57(9): 81-88.
[2]	郭晓静，隋昊达. 改进YOLOv3在机场跑道异物目标检测中的应用[J]. 计算机工程与应用, 2021, 57(8): 249-255.
[3]	李莉，纪欣沅，宋嵩. 回环软件缺陷数量预测模型[J]. 计算机工程与应用, 2021, 57(7): 158-163.
[4]	霍光煜，张勇，孙艳丰，尹宝才. 基于语义的档案数据智能分类方法研究[J]. 计算机工程与应用, 2021, 57(6): 247-253.
[5]	杨芳，尹曦，司建辉，刘宏媛，汪雪. 基于侧重点聚类的数学表达式相似度计算方法[J]. 计算机工程与应用, 2021, 57(6): 88-93.
[6]	赵凡，张琳，闻治泉，杨林林，蔺广逢. 一种直接高效的自然场景汉字逼近定位方法[J]. 计算机工程与应用, 2021, 57(6): 159-167.
[7]	彭启慧，宣士斌，高卿. 分布的自动阈值密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(5): 71-78.
[8]	李勇振，廖湖声. 基于图卷积神经网络的多视角聚类[J]. 计算机工程与应用, 2021, 57(5): 115-122.
[9]	王昌龙，张远东，缪宏，杨煜恒. 双通道卷积神经网络在南瓜病害识别上的应用[J]. 计算机工程与应用, 2021, 57(5): 183-189.
[10]	胡晓敏，王明丰，张首荣，李敏. 用于文本聚类的新型差分进化粒子群算法[J]. 计算机工程与应用, 2021, 57(4): 61-67.
[11]	王俊玲，卢新明. 基于语义相关的视频关键帧提取算法[J]. 计算机工程与应用, 2021, 57(4): 192-198.
[12]	王芙银，张德生，张晓. 结合鲸鱼优化算法的自适应密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(3): 94-102.
[13]	陈俊丰，郑中团. WKMeans与SMOTE结合的不平衡数据过采样方法[J]. 计算机工程与应用, 2021, 57(23): 106-112.
[14]	张忠林，赵昱，闫光辉. 自然邻居密度极值聚类算法[J]. 计算机工程与应用, 2021, 57(23): 200-210.
[15]	梅婕，魏圆圆，许桃胜. 基于密度峰值多起始中心的融合聚类算法[J]. 计算机工程与应用, 2021, 57(22): 78-85.

基于数据关联性聚类的数据布局算法

Data placement algorithm based on data dependence

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics