Semi-automated Software Requirements Categorisation using Machine Learning Algorithms

Authors

  • Pratvina Talele Department of Computer Engineering and Technology, Dr. Vishwanath Karad MIT World Peace University, Pune, India
  • Siddharth Apte Department of Computer Engineering and Technology, Dr. Vishwanath Karad MIT World Peace University, Pune, India
  • Rashmi Phalnikar Department of Computer Engineering and Technology, Dr. Vishwanath Karad MIT World Peace University, Pune, India
  • Harsha Talele Department of Computer Engineering, Pimpri Chinchwad College of Engineering, Pune, India

DOI:

https://doi.org/10.32985/ijeces.14.10.3

Keywords:

Natural Language Processing, Machine Learning, Software Engineering, Supervised Machine Learning

Abstract

Requirement engineering is a mandatory phase of the Software development life cycle (SDLC) that includes defining and documenting system requirements in the Software Requirements Specification (SRS). As the complexity increases, it becomes difficult to categorise the requirements into functional and non-functional requirements. Presently, the dearth of automated techniques necessitates reliance on labour-intensive and time-consuming manual methods for this purpose. This research endeavours to address this gap by investigating and contrasting two prominent feature extraction techniques and their efficacy in automating the classification of requirements. Natural language processing methods are used in the text pre-processing phase, followed by the Term Frequency – Inverse Document Frequency (TF-IDF) and Word2Vec for feature extraction for further understanding. These features are used as input to the Machine Learning algorithms. This study compares existing machine learning algorithms and discusses their correctness in categorising the software requirements. In our study, we have assessed the algorithms Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), Neural Network (NN), K-Nearest Neighbour (KNN) and Support Vector Machine (SVM) on the precision and accuracy parameters. The results obtained in this study showed that the TF-IDF feature selection algorithm performed better in categorising requirements than the Word2Vec algorithm, with an accuracy of 91.20% for the Support Vector Machine (SVM) and Random Forest algorithm as compared to 87.36% for the SVM algorithm. A 3.84% difference is seen between the two when applied to the publicly available PURE dataset. We believe these results will aid developers in building products that aid in requirement engineering.

Downloads

Published

2023-11-29

How to Cite

[1]
P. Talele, S. Apte, R. Phalnikar, and H. Talele, “Semi-automated Software Requirements Categorisation using Machine Learning Algorithms”, IJECES, vol. 14, no. 10, pp. 1107-1114, Nov. 2023.

Issue

Section

Original Scientific Papers