Computer Engineering and Applications ›› 2013, Vol. 49 ›› Issue (2): 80-85.

Previous Articles     Next Articles

Statistical properties of file modification in open-source software repositories:case studies

LIN Sihai1, MA Yutao2, CHEN Jianxun1   

  1. 1.College of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430065, China
    2.State Key Lab of Software Engineering, Wuhan University, Wuhan 430072, China
  • Online:2013-01-15 Published:2013-01-16

开源软件库中文件修改的统计特性:案例分析

林四海1,马于涛2,陈建勋1   

  1. 1.武汉科技大学 计算机科学与技术学院,武汉 430065
    2.武汉大学 软件工程国家重点实验室,武汉 430072

Abstract: Open and collaborative development of open-source software may change the traditional mode for software development. Mining the evolutionary rules of source code files in a SVN(Subversion) repository is conducive to detecting potential bugs and then to improving the quality of software. This paper conducts empirical experiments on two object-oriented open-source software systems, and finds that the number of changes for class files follows roughly a power-law distribution, the modification difference between a pair of adjacent versions of frequently-changed classes also follows approximately a power-law distribution, and there are significantly positive correlations between the number of changes and both source lines of code and the number of imported classes, implying that the function and structure of these classes tend to become more complex. The findings of these 2 case studies provide new insights into the research on evolution, refactoring, and task allocation of maintenance for open-source software.

Key words: open source, reversion, SVN, power law

摘要: 开源软件的开放合作模式有望改变传统的软件开发方式,挖掘SVN(Subversion)代码库中文件的版本变化规律,有助于发现潜在缺陷,从而改善软件质量。以两个面向对象开源软件为例,发现其中的类文件修改次数大致服从幂率分布,并且修改次数多的类,其相邻版本间内容的修改量也大致服从幂率分布;此外,类的修改次数与代码行数和导入类的个数呈明显的正相关性,表明类的功能和结构倾向于变得更复杂。案例分析的发现有望为研究开源软件的演化规律、重构时间点的选择以及维护任务的分配等提供新的思路。

关键词: 开源软件, 版本, SVN, 幂率