计算机工程与应用 ›› 2014, Vol. 50 ›› Issue (18): 162-166.

• 图形图像处理 • 上一篇    下一篇

基于动态聚类的文档碎纸片自动拼接算法

尹玉萍1,刘万军2,张  冲1,刘永超1   

  1. 1.辽宁工程技术大学 电气与控制工程学院,辽宁 葫芦岛 125105
    2.辽宁工程技术大学 软件学院,辽宁 葫芦岛 125105
  • 出版日期:2014-09-15 发布日期:2014-09-12

Automatic documents fragment re-assembly algorithm based on dynamic clustering

YIN Yuping1, LIU Wanjun2, ZHANG Chong1, LIU Yongchao1   

  1. 1.School of Electrical and Control Engineering, Liaoning Technical University, Huludao, Liaoning 125105, China
    2.School of Software, Liaoning Technical University, Huludao, Liaoning 125105, China
  • Online:2014-09-15 Published:2014-09-12

摘要: 针对碎纸机三种碎纸模式进行拼接复原,提出了一种基于动态聚类的文档碎纸片自动拼接算法,定义了匹配度矩阵计算两块碎片最合理的拼接方式,设计了一种基于碎纸片特征向量的动态聚类行聚类算法进行行初步聚类,根据文字特征线及计算出的行距对初步聚类进行了调整修正,确定最终的行分类及行间顺序,根据提出的动态四邻近匹配算法,匹配出复原结果。实验表明,该方法实现简单,成功率高,能快速得到碎纸片的三种碎纸模式的拼接复原结果。

关键词: 动态聚类, 碎纸拼接, 匹配度矩阵, 碎纸片特征向量

Abstract: This paper proposes an automatic documents fragment re-assembly algorithm based on dynamic clustering. It defines the matching matrix and calculates two fragments most reasonable splicing. It designs a dynamic line clustering algorithm based on eigenvector of torn pieces for a preliminary clustering. According to the text characteristic line and spacing, it adjusts the results of preliminary clustering, which can determine the final line classification and order. It matches the recovery results according to dynamic four adjacent matching algorithm. Experiment results show that this method is simple, and success rate is high, and can get stitching recovery results quickly.

Key words: dynamic clustering, stitching of torn pieces, matching matrix, eigenvector of torn pieces