点击下载
本文文档

当前位置：首页 - 正文

Quality Assessment based on Attribute Series of So

来源：动视网责编：小OO 时间：2025-09-25 05:27:28

Quality Assessment based on Attribute Series of So

QualityAssessmentbasedonAttributeSeriesofSoftwareEvolutionJacekRatzingerViennaUniversityofTechnologyInstituteofInformationSystemsA-1040Vienna,Austriaratzinger@infosys.tuwien.ac.atHaraldGall,MartinPinzgerUniversityofZurichDepartmentofInformaticsCH-80

推荐度：

点击下载本文 文档为doc格式

导读QualityAssessmentbasedonAttributeSeriesofSoftwareEvolutionJacekRatzingerViennaUniversityofTechnologyInstituteofInformationSystemsA-1040Vienna,Austriaratzinger@infosys.tuwien.ac.atHaraldGall,MartinPinzgerUniversityofZurichDepartmentofInformaticsCH-80

Quality Assessment based on Attribute Series of Software Evolution

Jacek Ratzinger Vienna University of Technology Institute of Information Systems A-1040Vienna,Austria ratzinger@infosys.tuwien.ac.at

Harald Gall,Martin Pinzger University of Zurich Department of Informatics CH-8050Zurich,Switzerland {gall,pinzger}@iﬁ.unizh.ch

Abstract

Defect density and defect prediction are essential for ef-ﬁcient resource allocation in software evolution.In an em-pirical study we applied data mining techniques for value series based on evolution attributes such as number of au-thors,commit messages,lines of code,bugﬁx count,etc. Daily data points of these evolution attributes were cap-tured over a period of two months to predict the defects in the subsequent two months in a project.For that,we developed models utilizing genetic programming and lin-ear regression to accurately predict software defects.In our study,we investigated the data of three independent projects,two open source and one commercial software system.The results show that by utilizing series of these attributes we obtain models with high correlation coefﬁ-cients(between0.716and0.946).Further,we argue that prediction models based on series of a single variable are sometimes superior to the model including all attributes:in contrast to other studies that resulted in size or complexity measures as predictors,we have identiﬁed the number of authors and the number of commit messages to versioning systems as excellent predictors of defect densities.

1Introduction

How does the course of software development over time inﬂuence defect densities?To address this question we fo-cus on series classiﬁcation techniques,where we generate value series from software evolution and take it as input for quality assessment.

Defect prediction models of previous studies often relied on metrics that represent the state of the software system at a speciﬁc moment in time(e.g.[7,3,17,11]).Such metrics describe,for example,the sum of changes implemented on a certain part of the system or are other types of measures such as size and complexity metrics(e.g.[2]).

In this paper we show that change over time is an im-

portant aspect in software prediction models.Our previous

study already incorporated time-related data into classiﬁca-

tion[20],where we measured values such as the average

number of days between changes and the peak month,in which most changes took place within the learning period.

As these time-related features have been very important for

good prediction models,we go a step further and explicitly

focus on series of metric values.

Sequential patterns are important in many domains,be-

cause they can be exploited to improve the prediction accu-

racy of classiﬁers.A sequence x= x1,x2,x3,...,x n of change events during software development contains the in-

formation on the course of development additionally to the

pure attributes of the sum of all change events describing

the state at theﬁnal point in time.As one of theﬁrst studies

we analyze value series of evolution data to create defect

prediction models.

Our evolution attributes of sourceﬁles refer to measures

obtained from versioning and bug reporting data such as the

number of bugﬁxes or the number of authors working on a

particularﬁle.Evolution attributes are measured daily over

a period of two months to predict the number of defects in

sourceﬁles in the subsequent two months.The data of three

independent software projects in ourﬁeld study allows us to

build prediction models with high accuracy using series of

evolution attributes.

Our study can be compared to other prediction ap-

proaches since we included the same data of a previous

study[20]taken from a commercial system into our anal-

ysis.Additionally,we broadened our evaluation by incor-

porating data from open source projects and created predic-

tion models on the evolution data of these different software

projects.

This paper is organized as follows.We present a descrip-

tion of our knowledge discovery process in Section2.Sec-

tion3lays the foundation through the preparation of evolu-

tion data.Section4explains how we set up evolution series,

which are used as input to the series mining of Section5.Our results are reported in Section6.In Section7we dis-cuss the state the state of the art and Section8ﬁnishes with conclusions.

2Knowledge Discovery Process

Several consecutive steps are executed in our knowl-edge discovery process to obtain prediction models based on value series.The basic process is as follows: First,the data collection steps extract evolution data from two sources:versioning systems such as CVS and issue tracking systems such as Jira.Data items taken from differ-ent systems have to be assembled into a joined data model to establish an evolution database.Additionally,a relation-ship is established between the data items from a single data source(e.g.the transactions of the versioning system are reconstructed to group items into sets of co-changed ele-ments).

Next,the evolution database is used to compute change attributes such as the number of lines added for bugﬁxes, the number of co-changedﬁles,or the number of modiﬁca-tions without a commit message.These are the character-istics of our data items that are used to create value series for defect prediction.Fenton and Neil pointed out that a sound prediction model has to incorporate different types of attributes[4].Accordingly,we analyze several types of attributes,where a value series is created for each attribute type.Additionally,series containing attributes of all cate-gories represent the changes over time for a single instance (i.e.aﬁle).

In the next step we take the value series of evolution at-tributes as the basis for our defect prediction models.To be able to apply classiﬁcation algorithms to the value se-ries we extract features describing the relevant characteris-tics of the value series.In data mining the input attributes used by the algorithms are called features.An example of such a feature is the maximum number ofﬁles changed to-gether regarding a particularﬁle.The feature extraction is done automatically with the help of genetic programming, in which several operations are applied on the data points in the value series.The genetic algorithm searches the fea-ture space guided by aﬁtness function(i.e.the correlation coefﬁcient of our defect prediction models).The best fea-tures discovered through genetic programming are the input of the regression algorithms to create the prediction model. The platform for our series mining activities is the Y ALE machine learning environment[14].It allows the design of operator chains for a large number of learning problems and includes many data mining algorithms such as support vec-tor machines,decision tree learners,bayesian learners,etc.

Finally,we describe the results of aﬁeld study,in which we applied the prediction models to several projects taken from three different domains to evaluate the accuracy of the prediction.The following sections describe each step in de-tail and present our results.

3Preparing Evolution Data

There are many different systems that all record differ-ent aspects of the development and evolution of a software system.Project managers have to be able to observe the status of individual tasks as well as the progress of the en-tire project.Developers need information about what is re-quested from them and need storage systems for their re-sults.Thus,different aspects are covered by different sys-tems,which we have to integrate for our analysis.

3.1Data Extraction

As data sources for our defect prediction we utilize ver-sioning and issue tracking systems.Currently our approach supports the versioning system CVS and the issue tracking systems Jira and Bugzilla.CVS keeps track of all changes in sourceﬁles.For eachﬁle we retrieve these change logs, parse,and store the extracted information into the evolution database[20].Issue or bug report data is obtained from Jira or Bugzilla.These systems track bug and feature requests from users and customers.We process each request and add the information to the database.

3.2Data Processing

Software components are related with each other through shared data or method calls,or inheritance relations.Dur-ing software development a relationship is also established when developers work on several classes or modules to ac-complish a certain task.Co-change coupling during the evo-lution of a software system provides valuable information in the context of maintenance[5].We obtain couplings from the versioning systems by reconstructing transactions when ﬁles are submitted together to CVS.Thus,transactions T n are deﬁned as a set ofﬁles,which were checked-in into a versioning system by a single author with an equal commit message.To capture the entire transaction,possibly lasting several minutes we use a dynamic time window approach. Everyﬁle submission outside of a previous transaction de-ﬁnes the start of a new transaction lasting initially for60 seconds.When anotherﬁle submission is discovered within the time frame of this transaction then theﬁle is added and the transaction time is expanded to last until60seconds af-ter the lastﬁle submission time.

Co-change coupling is established based on common transactions ofﬁles.Two entities(e.g.ﬁles)are coupled,if a modiﬁcation of the implementation affected both entities. The intensity of coupling between two entities a,b can be determined by counting all transactions where a and b aremembers of the same transaction,i.e.,C={ a,b |a,b∈T n}is the set of change couplings and|C|is the intensity of coupling[19].

3.3Combining Data Sources

To count defect densities ofﬁles we attach defect in-formation stored in the issue tracking system with theﬁle information from the versioning system.For this step we inspect the commit messages associated with revisions of sourceﬁles for references to issues,which is accomplished with regular expressions.When a matching issue is found,a link between the issue and the corresponding CVS log entry is stored only if the creation date of this issue is before the submission of theﬁle to CVS.

4Generating Evolution Series

In this paper the focus is on the lifetime of sourceﬁles during software evolution.For this we measure a set of evo-lution attributes for each sourceﬁle over time and compose multiple value series describing the data points of the at-tributes as a sequence of measures.In ourﬁeld study we use two months of development time to predict the defects of the following two months(see Section6.2).Theﬁrst two months comprise61days.On all days of this series period we measure the attributes for eachﬁle.For example the number of lines added within one day is summarized for the data points of this attribute in the value series.As a result many values in the series are zero,as in a develop-ment project not all sourceﬁles are modiﬁed on each day. The number of defects is predicted for the entire period of the following two months for each sourceﬁle.Thus,the in-stances for the prediction models areﬁles.In the following we describe the different evolution attributes and the gener-ation of series in detail.

A deﬁnition of generalized series is used for value series: In a series each element x i is composed of two components. Theﬁrst is the index describing a position on a straight line (e.g.time).The second is a vector of values.In our case we use two types of vectors.One is a reduced case where only one attribute represents the vector.In the second case the dimension of the vector is given by the number of evolution attributes[14].

4.1Evolution Attributes

Using the information stored in the evolution database we compute a number of attributes that quantify the soft-ware evolution in a sourceﬁle.According to previous stud-ies,which showed that relative data outperforms absolute values in defect prediction[16],we use the following evo-lution attributes as foundation for our value series of relative measures.All measures are collected within a time frame of two months,where the data points are accumulated.•Lines Added this measure represent the sum of lines added.This measure is one of the indicators of size,where the developer probably adds functionality through new source lines.

•Lines Deleted describes the number of lines removed from aﬁle.When a certain line is changed the ver-sioning system counts one line added and one line re-moved.The number of deleted lines is additionally an indicator for a”clean up”mentality to keep only the used code.

•Number Changes is the number of modiﬁcations im-plemented within a single day on a givenﬁle.This is a general activity indicator.

•Number Authors is the number of authors working on

a singleﬁle.When several authors work on the same

day on oneﬁle,we expect interferences between the changes.

•Author Switches describes the number of times the work of aﬁle is handed over from one author to an-other.When,for example,two authors work in the se-quence author1,author2,author2,author1we denote two author switches.When the work of several au-thors is strongly interwoven,we expect the strongest impact on defects.

•Commit Messages indicates the number of different commit message from developers on changes.We see the commit message as an indicator for the discipline of developers,as developers sometimes tend to reuse the message of the last commit instead of describing the actual work.

•With No Message describes the number of changes without a commit message.This provides insight into the discipline of the developers.This could be an indi-cator that the developer is in a hurry.

•Number Bugﬁxes is the number of issues that caused changes in theﬁle.Aﬁle with many defects in the past is expected to have defects in the future[7].

•Bugﬁx Lines Added is the counterpart to the number of lines added,but this time the number of lines is only taken into account if the change is a bugﬁx according to the information from the issue tracking system.•Bugﬁx Lines Deleted measures the number of lines deleted from aﬁle only for bugﬁxes.•Couplings is the strength of co-change coupling of a ﬁle with otherﬁles.It counts how many times a change was done with otherﬁles.Coupling has been an indi-cator for architectural weeknesses[19].•CoChanged Files–in contrast to the Couplings–de-scribes the number ofﬁles that were changed together with theﬁle of interest.For several modiﬁcations each co-changedﬁles is counted only once.We expect the moreﬁles are changed together,the higher is the com-plexity and the more difﬁcult it is to keep the consis-tency.

•CoChanged New Files is the number ofﬁles that were created together with a change to the investigatedﬁle.

When newﬁles are introduced into a system,it is an indicator for growth and new functionality.•Transaction Lines Added is the number of lines added in allﬁles that have couplings with theﬁle of interest.

This measure the entire work of a commit ofﬁles that are related to theﬁle of interest.

•Transaction Lines Deleted is the number of lines deleted in allﬁles with common transactions on changes.

•Transaction Bugﬁx Lines Added measures the number of lines added for allﬁles during a change event that treats a bug.

•Transaction Bugﬁx Lines Added describes the number of lines deleted for allﬁles during bugﬁxes together with theﬁle of interest.

4.2Value Series

The absolute values of the evolution attributes,which are described in the previous section,are used to construct the ﬁnal value series containing relative measures ordered by time.For each day the relative attribute value is computed and added to the value series.For example,we use the num-ber of authors relative to the number of changes on each day in our series period.The sequence1/1,0,2/3,1/1would re-sult for four days when one change is committed on theﬁrst day,no change happens on the second day,two developers implemented a total of three changes on the third day,and one change is committed on the fourth day.

The following list of relative measures is used to create value series perﬁle for each day.For each relative feature a division of relative values from the previous section is com-puted.

•LinesAdd:Lines of code added within a day/Total lines of code until this day.

•LinesDel:Lines of code deleted within a day/Total lines of code until this day.•ChangeCount:Number of changes within a day/Total number of changes in the history of theﬁle until this day.

•Authors:Number of authors within a day/Number of changes within this day •AuthorSwitches:Number of switches of the author/ Number of authors •CommitMessages:Number of different commit mes-sages/Number of changes •WithNoMessage:Number of changes without commit message/Number of commit messages

•BugﬁxCount:Number of bugﬁxes/Number of changes

•BugﬁxLinesAdd:Lines added for bugﬁxes/Number of lines added(any type)

•BugﬁxLinesDel:Lines deleted for bugﬁxes/Number of lines deleted(any type)•CoChangeCount:Number of couplings/Number of changes

•CoChangedFiles:Number of co-changedﬁles/Num-ber of changes •CoChangedNewFiles:Number of newly introduced ﬁles that are co-changed/Number of co-changedﬁles •TLinesAdd:Number of lines added in all co-changed ﬁles/Number of couplings

•TLinesDel:Number of lines deleted in all co-changed ﬁles/Number of couplings

•TBugﬁxLinesAdd:Number of lines added in allﬁles for bugﬁxes/Number of lines added

•TBugﬁxLinesDel:Number of lines deleted in allﬁles for bugﬁxes/Number of lines deleted

5Predicting Defects based on Evolution Series

Given the value series of relative evolution attributes as described in the previous section,the aim of our approach is to derive models for predicting the number of defects in sourceﬁles.For the model generation we use”classical”data mining algorithms such as linear regression.These al-gorithms are not able to handle value series in the explicitrepresentation,but can operate on sets of attributes instead of sets of series of values.

We generate a new representation of our series informa-tion that is suitable for linear regression.This task is called feature extraction,where each series is described by a set of relevant characteristics that make different evolution series distinguishable.In a similar manner we could describe a value series containing positions of the sun on earth with the following features:one cycle lasts for24hours,the maxi-mum is reached at noon,sunrise and sunset are related with the degree of latitude on earth,...

The feature extraction itself is decomposed into a set of basic operators.For example functions returning the min-imum,average,or maximum of the values in a series are basic operators.Other basic operators return an index such as the location of a peak value within a given series.Such basic operators are assembled into an operator tree describ-ing the extraction steps of theﬁnal features.However,the manual selection of an optimal set of operators is a tedious task.Therefore,machine learning is used to select appropri-ate operator tree,where the selection is done with the help of genetic algorithms.

Thus we have to carry out two learning tasks for our de-fect prediction.

1.Learning a set of operator trees for the feature extrac-

tion utilizing genetic programming.The resulting fea-tures describe relevant characteristics of evolution se-ries for data mining algorithms such as linear regres-sion.

2.Learning a model for defect prediction from the ex-

tracted features.

5.1Extracting Features from Series

In the process of feature extraction a set of basic opera-tors is organized into a tree,where each operator uses the output of the predecessor.The output of the operators at the leaves produce the features of the series.We distinguish two types of basic operators:Transformations and functions: Transformations convert a series into another series.Dif-ferent types of transformations are available for our defect prediction approach such asﬁlters(e.g.smoothing),fre-quency transformations(e.g.Fourier transformation),gen-eralized windowing,etc.Windowing operators apply a given function on a range of values within the series and additionally slide the window over the series.Others are branches that pass on the interim results to two successor sub-trees.

Functions generate single values based on the entire value series and are always the last step of the feature ex-traction process(i.e.the leaf nodes of the operator tree).Ex-amples of functions are statistics such as average,variance,standard deviation.These functions may be applied on the values themselves or on the indexes of the values,where for example the index of a peak value could be extracted.For an extensive list of transformations and functions see[14].

5.1.1Genetic Programming

The(locally)optimal feature extraction approach(i.e.op-erator tree of transformations and functions)is elicited with genetic programming utilized on the operator trees.

Mutations randomly insert a new operator,delete an op-erator,replace an operator,or change the parameters of an operator.

Crossover switches a sub-tree from one feature descrip-tion tree by a sub-tree from another tree.According to the standard process of genetic programming the instances with the highestﬁtness are selected for the next generation.

Selection is done based on a tournament between all members of a generation in the genetic algorithm.

Fitness of the operator trees for the tournament selec-tion is assessed based on the defect prediction capability of the resulting features.Ourﬁtness function is the regression algorithm itself that is used for the generation of the predic-tion model.Thus,for each operator tree a regression func-tion is generated based on a training set of a random sample containing50evolution series instances and the accuracy of the prediction of defects is used as theﬁtness value.As a result,the operator trees generating features that predict the defects best are selected for the next generation.

Initiation of theﬁrst generation in the genetic algorithm is based on50operator trees,where the operators are ran-domly selected from the pool of available transformations and functions.

We limited the maximal number of generations by8. Further,the following parameters are deﬁned for our ap-proach:probability of adding a new operator=0.4,proba-bility of adding a branching operator to create new sub-trees =0.05,probability of changing an operator=0.4,probabil-ity of removing an operator=0.2,probability of performing

a crossover=0.5,probability of changing a parameter=0.1.

5.2Applying Data-Mining Algorithms to

Series Features

The best features selected by the genetic programming algorithm are used for the creation of the prediction of de-fects.The data mining algorithm for our prediction is lin-ear regression,as our outcome as well as our features from value series are numeric.This is a staple method in statistics where the predicted value is represented by a linear combi-nation of the input attributes(i.e.features)with weights w0,w1,w2,...,w n and attributes a1,a2,...,a n:

x=w0+w1a1+w2a2+...+w n a nThe weights are derived from the training data set minimiz-ing the sum of squares of the distance between the predicted value x and the actual one y.The distance is summarized for all instances(k)of the training data set:

k (y−

w i a i)2

The numeric prediction algorithms are used twice in our process.First it is used for the evaluation ofﬁtness in ge-netic programming,where prediction models are build on a small random sample of evolution series and the correla-tion coefﬁcient is utilized to select the best features.Finally, we apply the prediction algorithms on the extracted features taking all instances of the training set(i.e.all evolution se-ries)into account to create theﬁnal prediction model.

6Evaluation

We evaluated the approach of defect prediction based on series mining with the help of aﬁeld study,where we se-lected different real world projects and analyzed the pre-dictability of defects in the near future.

6.1Field Study

In ourﬁeld study we analyzed two open source projects (ArgoUML and the Spring framework)and a commercial software system,which we selected to get comparability with the results of previous studies([20,21]).The commer-cial software system is from the health care domain,written in Java and contains more than8.600classes with735.000 lines of code.ArgoUML and the Spring framework are large well-known open source projects both developed in Java and consist of about5.000and10.000classes,respec-tively.In Java classes are almost equivalent toﬁles,thus we useﬁles as basic instances in our analysis.

6.2Evaluation Setup

To estimate the accuracy of our defect prediction ap-proach we use the same time periods for all projects,regard-less in which development state the project is.In a previous study we have shown that defects that occur within a short time before a release can be better predicted than defects after a release[20].In our current research activity we have two periods:

•Series Period:November-December2005.In this pe-riod we take evolution attributes from the versioning system and construct value series to represent theﬂow of the development over time.Each series of the at-tributes from Section4.2has a length of61days given the two months of the series period.This information

is used in our series mining to predict the defects of a sourceﬁle in the next period.

•Target Period:January-February2006.With our pre-diction models based on series mining we try to predict the number of all defects in the target period,where the defects are counted based on the information from the issue tracking system and are mapped toﬁles as de-scribed in Section3.3.

These two periods are also used in our previous study[20] and thus enable us to compare the results of these two ap-proaches.

6.3Measuring Prediction Performance

For the assessment of our prediction models we use the following metrics:

•Correlation Coefﬁcient(Cor.C.)ranges from-1to1 and measures the statistical correlation between the predicted values and the actual ones in the test set.A value of0indicates no correlation,whereas1describes

a perfect correlation.Negative correlation indicates in-

verse correlation,but should not occur for prediction models.The correlation coefﬁcient is computed with the following formula,where p are the predicted values and a are the actual ones:

(p i−p)(a i−a)/n−1

(

(p i−p)2/n−1)∗(

(a i−a)2/n−1)

where

p=1/n

p i and a=1/n

a i

The correlation coefﬁcient is our primary performance indicator.

•Mean Absolute Error(Abs.Error)is the average of the magnitude of individual absolute errors.This assess-ment metrics does not have aﬁxed range like the corre-lation coefﬁcient,but is geared to the values to be pre-dicted.As a result,the closer the mean absolute error is to0the better.A value of1denotes that on average the predicted value differs from the actual number of defects by1(e.g.3instead of4).The mean absolute error is computed with the following formula:

|p1−a1|+...+|p n−a n|

•Mean Squared Error(Sqr.Error)is the average of the squared magnitude of individual errors and it tends to exaggerate the effect of outliers–instances with larger prediction error–more than mean absolute error.The range of the mean squared error is geared to the ranges

of predicted values,similar to the mean absolute er-ror.But this time the error metrics is squared,which overemphasize predictions that are far away of the ac-tual number of defects.The quality of the prediction model is good,when the mean squared error is close to the mean absolute error.The formula for mean squared error is:

(p1−a1)2+...+(p n−a n)2

As validation method we use10-fold cross validation to estimate the performance of our prediction models.In this method the set of sourceﬁles is randomly split into10dis-joint sets of equal size.The validation is executed10times, where the linear regression is trained on9of10folds and the remaining one is used to calculate the error rates and the correlation coefﬁcient.After the10turns theﬁnal perfor-mance estimates are generated through averaging the results of the10turns.

The validation used two times:First it is used for the assessment of theﬁtness of the features during genetic pro-gramming andﬁnally it is used for the assessment of the prediction models resulting from linear regression with the best features(see Section5).

6.4Results

In the following we describe theﬁeld study with the se-lected software projects and discuss performance measures of our prediction models.Furthermore,we investigate the signiﬁcance of evolution attributes.

6.4.1How well can we predict the number of defects in

sourceﬁles with series mining?

To answer this question we take the entire evolution series containing values of all attributes such as LinesAdd or Au-thors(see Section4.2).Table1describes the performance measures of our defect prediction models.Theﬁrst remark-able number is the very high correlation coefﬁcient of the commercial system from the healthcare domain.A correla-tion coefﬁcient of1would indicate perfect correlation of the prediction with the actual value,where the received0.946 indicates that very strong prediction models can be built based on evolution series.The other two projects reach a correlation coefﬁcient greater than0.7,which is still good.

According to theﬁrst performance indicator also the mean absolute error of all projects is low.The absolute er-ror has to be measured in relation with the predicted quan-tities.In our case we predict the number of defects that lie in the range of0up to7.As a result,the measured mean absolute errors of0.208to0.306are low.The commercial project has a higher absolute error than the two open source

Cor.C.Abs.Error Sqr.Error Commercial system0.9460.3060.508

Spring framework0.7160.2290.770 ArgoUML0.7300.2080.624 Table1.Defect prediction with series includ-

ing all evolution attributes

Number of defects Comm.Spring Argo

perﬁle System UML

1468047

211159

3532

4720

5200

6100

Table2.Defect distribution

projects because it has moreﬁles with multiple defects(e.g. 5or6defects),which can be seen in Table2.

The good prediction measures are supported by the mean squared error,which emphasizes outliers more than the mean absolute error.The squared error is lowest for the commercial project with a value of0.508.This corresponds with the high correlation coefﬁcient and indicates that the prediction is very accurate.However,also the squared errors of Spring with0.770and of ArgoUML with0.624 are low.Thus,we conclude:

Accurate prediction models can be developed based on series mining of evolution data.

6.4.2Which attributes are most signiﬁcant for defect

prediction?

In the previous section we presented prediction models based on series mining with a very high correlation coef-ﬁcient and good error measures.These models are created from an evolution series containing all attributes described in Section4.We are interested toﬁnd out which attributes are most signiﬁcant to create accurate prediction models. For this we create prediction models on value series for each single evolution attribute.Table3presents the correlation coefﬁcients of all generated models,as this performance indicator represents the relationship between the predicted values and the actual ones.

All three projects of theﬁeld study exhibit high values for the correlation coefﬁcient on the series containing the number of authors.In the commercial system as well as in ArgoUML this single series is even the one with the highest correlation coefﬁcient.For the Spring framework it is onlyComm.Spring Argo

Cor.C.Cor.C.Cor.C.

LinesAdd0.6160.1950.161

LinesDel0.3050.1110.234

ChangeCount0.5170.6530.268

Authors0.9460.6280.760 AuthorSwitches0.6220.2100.357

CommitMsgs0.9430.4800.459

WithNoMsg0.2730.008-0.054

BugﬁxCount0.4550.2900.253

BugﬁxLinesAdd0.4370.2940.295

BugﬁxLinesDel0.7360.3190.244

CoChangeCount0.5480.3360.388

CoChangedFiles0.4810.2400.409

CoChangedNew0.4260.1710.233

TLinesAdd0.5980.6220.442

TLinesDel0.5860.5790.225 TBugﬁxLinesAdd0.4820.3180.362

TBugﬁxLinesDel0.4600.3190.296

series of all attributes0.9460.7160.730

Table 3.Correlation coefﬁcients of series

with a single attribute and the summarizing

series including all attributes

exceeded by the series with ChangeCounts,which describes the number of changes per day in relation to total number of changes for this particularﬁle.In the two other projects the ChangeCount is ranked only in the middle-ﬁeld.

Authors seems to provide good input to series mining, which contrasts the results of Graves et al.[7].In our knowledge discovery process we use value series for defect prediction.Therefore,we measure how many authors have implemented modiﬁcations to a givenﬁle and set this mea-sure in relation to the number of modiﬁcations implemented by these authors.We use relative measures,which have shown to be better predictors than absolute measures[16]. Moreover,we observe the alteration of the number of au-thors implementing modiﬁcations over time,which can pro-vide more accurate data to the prediction models than met-rics focusing on aﬁxed point in time.

Another interesting sub-series is the one containing the number of commit messages in relation to the number of changes.This CommitMsgs series has even the second highest correlation coefﬁcient in the commercial project and ArgoUML.In the Spring framework it is on positionﬁve with a correlation coefﬁcient of0.48.

It is quite surprising that the highest performance measures are not reached by size or complexity metrics,but by process and workﬂow related aspects such as Authors and CommitMsgs.However,on the third position for ArgoUML and Spring appears the series of TLinesAdd(see Table3).This attribute incorporates the number of lines changed within a commit transaction counting added lines of allﬁles that are involved in the transaction.This series reﬂects an aspect of the interdependency in object oriented software systems,as we take changes to other(related)ﬁles within a transaction into account.Contrary,the pure size measure of added lines of a particularﬁle is represented by LinesAdd.Although this sub-series plays a remarkable role for the commercial system,it has a very low correlation coefﬁcient in the open source projects.For the sub-series we conclude:

Projects have different rankings of sub-series,where common aspects can be identiﬁed,such as the number

of authors or commit messages.

6.5Limitations of the Study

Our models are based on evolution data taken from ver-sioning systems and the number of defects is established with data from the issue tracking system.The matching be-tween these two systems is based on heuristics as described in Section3.3.Although,such an approach is frequently used in research(e.g.[17,18,7,20])we cannot assure that we have identiﬁed all bugs as we certainly miss the ones that were not reported to the issue tracking system.

In general our mining approach is strongly dependent on the quality of our data for theﬁeld study.Validity of our ﬁndings is related with the data of the versioning and issue tracking system.Versioning systems register single events such as commits of developers,where the data depends on the work habits of the developers.However,in our previous work we showed that an averaging effect supports statistical analysis in general[19].Additionally,the data about work habits of people is by its own interesting information that we use for our quality prediction,where we can show that our prediction models rely heavily on such features(e.g., number of commit messages).

The data points of our value series are computed as sums of each day.As a result,if a developer works through the night and commits some modiﬁcations before midnight and the remaining parts of modiﬁcations after midnight,we count the work on two days.Although this inﬂuences our value series,such information could still be valuable for de-fect prediction,because the working over night might have consequences on the level of concentration and the resulting software quality.

We have selected different projects for ourﬁeld study: commercial vs.open source;different domains such as health care,UML and application server.However,we can-not claim that these projects are representative for all dif-ferent types of software projects.As a result the applica-tion of our approach to other software systems has to bere-evaluated on a per project basis.

7Related Work

The focusing on software evolution as a key aspect in software development provided interesting results in pre-vious research activities.Zimmermann et al.developed ROSE,a mining tool that suggests necessary changes to otherﬁles when a developer starts working on a certain ﬁle or group ofﬁles[22].We already used historical data to identify hot spots within software architecture,which should be subject to re-engineering activities[5].Both re-search activities rely on the idea of co-change coupling, where common changes toﬁles are analyzed.Based on evolution data Mockus and V otta accomplished an in-depth analysis for the reasons of software changes[15].Soft-ware evolution analysis is a very computational intensive task,where Bevan et al.have implemented a system called Kenyon for the efﬁcient fact extraction from data sources such as software repositories and bug tracking systems and storage of the evolution information for further analysis[1].

Historical data is necessary to assess quality prediction models as it can be used to count defects in software sys-tems.Graves et al.studied the aging of software and which factors lead to faults in future.They created a defect pre-diction model incorporating the sum of contributions from all changes to a given module,where large,recent changes had the highest impact[7].Another study was done by Os-trand et al.[18]in which several aspects of the history of software systems were utilized to build a negative binomial regression model for defect prediction.This study focused on long time periods and investigated14releases of a soft-ware system.

Other research activities focused on conventional object-oriented metrics such as the ones of Chidamber and Ke-merer[2].Khoshgoftaar et al.use software metrics as input to classiﬁcation trees to predict fault-prone modules.With the help of statistical tests subsets of modules were detected with uncertain classiﬁcations allowing enhancement strate-gies to resolve uncertainties[9].A recent approach for de-fect prediction based on software metrics was described by Nagappan et al.[17].They discovered in a study ofﬁve Mi-crosoft systems that failure-prone software entities are sta-tistically correlated with code complexity measures.How-ever,they could not identify a single set of complexity met-rics suitable for prediction in allﬁve projects.

Menzies et al.argue that the research on defect predic-tion should focus on the methods instead on the search for an optimal subset of the available data.With the help of static code metrics they could identify only one out of six methods(Naive Bayes with log-ﬁltered values),which had a median performance that was both large and positive[13]. Kim et al.does not focus on metrics but tries to identify entities that are in the locality of other bugs(or bugﬁxes). They exploit temporal and spatial locality and keep the in-formation in a bug cache[10].In our current research we focus on value series of evolution attributes.

The time series classiﬁcation problem can be deﬁned as follows:Given a universe of objects.Each object is de-scribed by a certain number of temporal attributes and clas-siﬁed into one particular class.The goal is toﬁnd a func-tion f(o)which is as close as possible to the true classiﬁ-cation c(o)[6].Kadous solves the problem of multivari-ate time series classiﬁcation with the help of parameterized event primitives(PEPs).The extracted events are clustered to create prototypical events.They are used as the basis for creating more accurate and comprehensible classiﬁers than hidden Markov Models or recurrent neuronal networks[8]. Manganaris developed a system for supervised classiﬁca-tion of univariate signals using piecewise polynomial mod-eling combined with a scale-space analysis technique(i.e.a technique that allows the system to cope with the problem that patterns occur at different temporal scales)[12].

8Conclusions

We presented a new approach to software defect predic-tion based on value series of evolution attributes:We con-ducted one of theﬁrst studies utilizing value series for de-fect prediction in software engineering.In this approach an entire series of measurements is used to predict a sin-gle label(the number of defects in aﬁle containing object-oriented entities).For the evaluation of our approach we conducted aﬁeld study of three different software projects. Each of them has its independent timeline regarding the evolution phases and release cycles.We use aﬁxed date for the data extraction from these projects,which results in a randomized selection within the timeline of each individ-ual project.

Our evolution measurements were obtained from soft-ware repositories such as the concurrent versioning system (CVS)where single information items such as the num-ber of authors were gathered into value series.Our re-sults showed that evolution series are excellent predictors of defect densities.We describe an analysis focusing on sub-series,where the prediction models based on series of a single variable are sometimes even superior to the over-all model.An interesting proponent of this category is the number of authors,where good models can be created on (up to a correlation coefﬁcient of0.946).Other aspects of software evolution,which are often used in software predic-tion,are less important(e.g.lines added).

Future work will concentrate on the input data we use for series mining.In another study we already presented mani-fold features for software evolution[20].We want to enrich our series mining approach to be able to analyze softwareprojects in more detail and to get a better understanding of the forces that inﬂuence software quality.To accomplish this goal we also look for improvements of series mining and the understandability of the resulting prediction mod-els.For example classiﬁcation and regression trees provide the beneﬁt that they provide a clear picture of the prediction model and the relationships of the used features.Kadous [8]presents interesting ideas in that direction.

9Acknowledgments

Thanks to Patrick Knab and Emanuel Giger for their valuable feedback on earlier versions of this paper.

This work was partially supported by the Swiss National Science Foundation as part of the project”COSE-Control-ling Software Evolution,”and the Hasler Foundation as part of the project”EvoSpaces-Multi-dimensional Navigation Spaces for Software Evolution”.

References

[1]J.Bevan,E.J.W.Jr.,S.Kim,and M.W.Godfrey.Facili-

tating software evolution research with kenyon.In Proceed-ings of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering, pages177–186,Lisbon,Portugal,September2005.

[2]S.R.Chidamber and C.F.Kemerer.A metrics suite for

object oriented design.IEEE Transactions on Software En-gineering,20(6):476–493,June1994.

[3]G.Denaro and M.Pezz`e.An empirical evaluation of fault-

proneness models.In Proceedings of the International Con-ference on Software Engineering,pages241–251.ACM Press,May2002.

[4]N.E.Fenton and M.Neil.A critique of software defect pre-

diction models.IEEE Transactions on Software Engineer-ing,25(5):675–6,September1999.

[5]H.Gall,M.Jazayeri,and J.Ratzinger(former Krajewski).

CVS release history data for detecting logical couplings.In Proceedings of the International Workshop on Principles of Software Evolution,pages13–23,Lisbon,Portugal,Septem-ber2003.IEEE Computer Society Press.

[6]P.Geurts.Pattern extraction for time series classiﬁcation.

In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery,pages115–127, 2001.

[7]T.L.Graves,A.F.Karr,J.S.Marron,and H.Siy.Predicting

fault incidence using software change history.IEEE Trans-actions on Software Engineering,26(7):653–661,2000. [8]M.W.Kadous.Learning comprehensible descriptions of

multivariate time series.In Proceedings of the Interna-tional Conference on Machine Learning,pages454–463, San Francisco,USA,June1999.

[9]T.M.Khoshgoftaar,X.Yuan,E.B.Allen,W.D.Jones,and

J.P.Hudepohl.Uncertain classiﬁcation of fault-prone soft-ware modules.Empirical Software Engineering,7(4):297–318,December2002.[10]S.Kim,T.Zimmermann,J. E.James Whitehead,and

A.Zeller.Predicting faults from cached history.In Pro-

ceedings of the International Conference on Software Engi-neering,pages20–26,Minneapolis,USA,May2007. [11]P.Knab,M.Pinzger,and A.Bernstein.Predicting defect

densities in source codeﬁles with decision tree learners.In Proceedings of the International Workshop on Mining Soft-ware Repositories,pages119–125,Shanghai,China,May 2006.ACM Press.

[12]S.Manganaris.Supervised Classiﬁcation with Temporal

Data.PhD thesis,Computer Science Department,School of Engineering,Vanderbilt University,December1997. [13]T.Menzies,J.Greenwald,and A.Frank.Data mining static

code attributes to learn defect predictors.IEEE Transactions on Software Engineering,33(1):2–13,2007.

[14]I.Mierswa,M.Wurst,R.Klinkenberg,M.Scholz,and

T.Euler.Y ALE:Rapid prototyping for complex data min-ing tasks.In Proceedings of the International Conference on Knowledge Discovery and Data Mining,pages935–940, Philadelphia,USA,2006.

[15] A.Mockus and L.G.V otta.Identifying reasons for soft-

ware changes using historic databases.In Proceedings of the International Conference on Software Maintenance,pages 120–130.IEEE Computer Society,2000.

[16]N.Nagappan and T.Ball.Use of relative code churn mea-

sures to predict system defect density.In Proceedings of the International Conference on Software Engineering,pages 284–292,St.Louis,MO,USA,May2005.ACM Press. [17]N.Nagappan,T.Ball,and A.Zeller.Mining metrics to

predict component failures.In Proceedings of the Interna-tional Conference on Software Engineering,pages452–461, Shanghai,China,May2006.ACM Press.

[18]T.J.Ostrand,E.J.Weyuker,and R.M.Bell.Where the

bugs are.In Proceedings on the International Symposium on Software Testing and Analysis,pages86–96,Boston,Mas-sachusetts,USA,July2004.

[19]J.Ratzinger,M.Fischer,and H.Gall.Evolens:Lens-view

visualizations of evolution data.In Proceedings of the In-ternational Workshop on Principles of Software Evolution, pages103–112,Lisbon,Portugal,September2005. [20]J.Ratzinger,M.Pinzger,and H.Gall.EQ-Mine:Predicting

short-term defects for software evolution.In Proceedings of the Fundamental Approaches to Software Engineering at the European Joint Conferences on Theory And Practice of Software,pages12–26,Braga,Portugal,March2007. [21]J.Ratzinger,T.Sigmund,P.V orburger,and H.Gall.Mining

software evolution to predict refactoring.In Proceedings of the International Symposium on Empirical Software Engi-neering and Measurement,page to appear,Madrid,Spain, September2007.

[22]T.Zimmermann,P.Weißgerber,S.Diehl,and A.Zeller.

Mining version histories to guide software changes.In Pro-ceedings of the International Conference on Software Engi-neering,volume00,pages563–572,Edinburgh,Scotland, UK,May2004.

Quality Assessment based on Attribute Series of So

推荐度：

点击下载本文 文档为doc格式

热门焦点

Quality Assessment based on Attribute Series of So

Quality Assessment based on Attribute Series of So

Quality Assessment based on Attribute Series of So

最新推荐

猜你喜欢

热门推荐