Software Defect Prediction using Deep Learning by Correlation Clustering of Testing Metrics
Keywords:Software Engineering, Software Testing, Abstract Syntax Tree, Machine Learning, Convolution Neural Network
The software industry has made significant efforts in recent years to enhance software quality in businesses. The use of proactively defect prediction in the software will assist programmers and white box testing in detecting issues early, saving time and money. Conventional software defect prediction methods focus on traditional source code metrics such as code complexities, lines of code, and so on. These capabilities, unfortunately, are unable to retrieve the semantics of source code. In this paper, we have presented a novel Correlation Clustering fine-tuned CNN (CCFT-CNN) model based on testing Metrics. CCFT-CNN can predict the regions of source code that contain faults, errors, and bugs. Abstract Syntax Tree (AST) tokens are extracted as testing Metrics vectors from the source code. The correlation among AST testing Metrics is performed and clustered as a more relevant feature vector and fed into Convolutional Neural Network (CNN). Then, to enhance the accuracy of defect prediction, fine-tuning of the CNN model is performed by applying hyperparameters. The result analysis is performed on the PROMISE dataset that contains samples of open-source Java applications such as Camel Dataset, Jedit dataset, Poi dataset, Synapse dataset, Xerces dataset, and Xalan dataset. The result findings show that the CCFT- CNN model increases the average F-measure by 2% when compared to the baseline model.