Mining Developer Dynamics for Agent-Based Simulation of Software Evolution
by Verena Herbold
Date of Examination:2019-06-27
Date of issue:2019-07-10
Advisor:Prof. Dr. Jens Grabowski
Referee:Prof. Dr. Jens Grabowski
Referee:Prof. Dr. Stephan Waack
Files in this item
Name:PhDThesis_VerenaHerbold.pdf
Size:6.91Mb
Format:PDF
Abstract
English
The steady growth of software in our daily life results in the need for quicker adaption of the software changing usage and requirements. This process is defined as software evolution. Primarily, it is concerned with changes that are responsible for the evolution. The most important contribution to this process results from developers, e.g., by adding code to the repository. This process is highly dynamic as the team constellation as well as the activity of individual developers is always changing. This is especially the case for open-source software (OSS) projects which are analyzed in this thesis because of the free availability. We create and evaluate several models describing software evolution. The main focus of the approach described in this thesis is in the source of the changes, i.e., the developers. Using Agent-based simulation, project managers have the ability to try different scenarios and estimate possible software evolution trends. For example, it is possible to choose a team constellation and evaluate if the chosen team will be able to fix enough bugs using the simulation output. If not, more developers can be added to the simulation. In this case, the developers are agents who create, update, and delete software artifacts and possibly add or fix bugs at the same time. Huge parts of this thesis are dedicated to find suitable simulation parameters and estimate them by mining software repositories to gain a realistic simulation. Questions like the size of the software project, the activity of developers, the number of bugs, and the structure of the software under simulation can be answered. We apply methods from data mining, machine learning and statistics for our work. For the simulation, the behavior of developers is estimated using heuristics gained from analyzing the history of different software projects. The resulting simulation model reflects different developer roles with varying workload. Although, the representation of OSS dynamics was limited. For a fine-grained developer contribution behavior, a state-based probabilistic model (Hidden Markov Model) was trained based on different levels of code-based and communication-based activities. This allows the developers to switch between different of activity. The same procedure is used to summarize the whole project activity with the aim to evaluate whether a project is still active. Therefore, we are interested in finding out how much activity is still performed in inactive projects, since a strict separation is difficult to find, but important for potential users of the project. The results of three case studies show that Agent-based simulation is a promising approach for the prediction of software evolution and that many relations can be described this way. In particular, it turned out that a dynamic developer and project behavior is indispensable for the description of OSS evolution, because otherwise the representation of software processes is too static.
Keywords: Agent-Based Simulation; Software Evolution; Mining Software Repositories; Hidden Markov Models; Developer Contribution