Semi-automated Software Requirements Categorisation using Machine Learning Algorithms

Pratvina Talele; Siddharth Apte; Rashmi Phalnikar; Harsha Talele

doi:10.32985/ijeces.14.10.3

Authors

Pratvina Talele Department of Computer Engineering and Technology, Dr. Vishwanath Karad MIT World Peace University, Pune, India
Siddharth Apte Department of Computer Engineering and Technology, Dr. Vishwanath Karad MIT World Peace University, Pune, India
Rashmi Phalnikar Department of Computer Engineering and Technology, Dr. Vishwanath Karad MIT World Peace University, Pune, India
Harsha Talele Department of Computer Engineering, Pimpri Chinchwad College of Engineering, Pune, India

DOI:

https://doi.org/10.32985/ijeces.14.10.3

Keywords:

Natural Language Processing, Machine Learning, Software Engineering, Supervised Machine Learning

Abstract

Requirement engineering is a mandatory phase of the Software development life cycle (SDLC) that includes defining and documenting system requirements in the Software Requirements Specification (SRS). As the complexity increases, it becomes difficult to categorise the requirements into functional and non-functional requirements. Presently, the dearth of automated techniques necessitates reliance on labour-intensive and time-consuming manual methods for this purpose. This research endeavours to address this gap by investigating and contrasting two prominent feature extraction techniques and their efficacy in automating the classification of requirements. Natural language processing methods are used in the text pre-processing phase, followed by the Term Frequency – Inverse Document Frequency (TF-IDF) and Word2Vec for feature extraction for further understanding. These features are used as input to the Machine Learning algorithms. This study compares existing machine learning algorithms and discusses their correctness in categorising the software requirements. In our study, we have assessed the algorithms Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), Neural Network (NN), K-Nearest Neighbour (KNN) and Support Vector Machine (SVM) on the precision and accuracy parameters. The results obtained in this study showed that the TF-IDF feature selection algorithm performed better in categorising requirements than the Word2Vec algorithm, with an accuracy of 91.20% for the Support Vector Machine (SVM) and Random Forest algorithm as compared to 87.36% for the SVM algorithm. A 3.84% difference is seen between the two when applied to the publicly available PURE dataset. We believe these results will aid developers in building products that aid in requirement engineering.

Semi-automated Software Requirements Categorisation using Machine Learning Algorithms

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Information

Make a Submission

JCR Impact factor for 2024

0.9