计算机工程与应用 ›› 2014, Vol. 50 ›› Issue (22): 158-162.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

PDF文档的跨终端发布技术

昌  磊1,陆  阳1,吴  雷1,2   

  1. 1.合肥工业大学 计算机与信息学院,合肥 230009
    2.安徽教育网络出版有限公司,合肥 230601
  • 出版日期:2014-11-15 发布日期:2014-11-13

PDF document across terminal publishing technology

CHANG Lei1, LU Yang1, WU Lei1,2   

  1. 1.School of Computer & Information, Hefei University of Technology, Hefei 230009, China
    2.Anhui Education Publishing House, Hefei 230601, China
  • Online:2014-11-15 Published:2014-11-13

摘要: 围绕目前出版社在对数字化内容进行跨终端发布时遇到的问题,重点对PDF文档的版面信息抽取和跨终端自适应重组等技术进行研究,提出了针对PDF文档中文本、图片等信息的抽取方法和版面结构分析方法,利用终端自适应重组算法对数字化内容进行跨终端发布;以此为基础设计了一套数字内容跨终端发布的系统,并应用在出版社的实际工作中,实验结果证明了方案的可行性。

关键词: PDF文件格式, 跨终端自适应重组, 版面信息抽取

Abstract: Facing on the issues when the press publishes digital content across the terminals, this paper lays special stress on the research of technologies on layout information extraction and across terminal adaptive recombination for the PDF document. The methods of extraction and layout structure analysis to the texts and pictures in the PDF document are proposed. Then the terminal adaptive recombination algorithm is applied to publish the digital content across the terminals. A set of publishing system through which the digital content is published across the terminals based on the above technologies. The experimental results prove that the approach is practical in real-world application.

Key words: Portable Document Format(PDF), cross terminal adaptive recombination, layout information extraction