Applying and Optimizing Machine Learning, Deep Learning and Data Science Techniques in Software Defect/Test Data to predict faults, locate faults, reduce testing resources, and automate processes.

Document Type

News Article


Nowadays Machine Learning and Deep Learning have been widely used in most domains to enable data-driven processes and solutions to many problems. But the application of these techniques is still at a rudimentary level in the software engineering domain. several prediction approaches are designed using machine learning techniques in the arena of software engineering such as prediction of effort, reliability, security, quality, fault, cost, and re-usability. Here, predicting faulty software modules is an important goal in software engineering. Software Fault Prediction (SFP) is the process to develop a model which can be utilized by software practitioners to detect faulty classes/modules before the testing phase. This would enable efficient allocation of testing resources, reducing the testing effort and better-informed decisions concerning release quality. Additionally, this would reduce the maintenance cost and Prior detection of the fault will lead to the flawless product and delivery of high-quality software projects. As a summary from existing literature, there are a few factors which affect the performance of fault prediction models such as availability of datasets, quality of data, class imbalance problem, overfitting of models, feature selection/reduction, software metrics, performance measures and other dataset issues. The solutions given in the literature for these issues are still at a rudimentary level. There is a space for future research in this discipline to address all these issues. That is, build robust fault prediction models using machine learning techniques, study transfer learning/cross-project feature learning when the unavailability of a dataset of the same kind of projects, identify better feature sets to represent the quality of the module…etc.

Research goals:

  1. Optimizing Machine Learning Algorithms and Designing New Algorithms(Graph theory-based)
  2. Identifying the set of software metrics (features) to represent the faulty/non-faulty module using NASA datasets (Intention to introduce new software metric to represent the quality of module)
  3. Visualization of Defect dataset to get a meaningful insight into metrics used to represent module quality
  4. Identifying/finding the better methodology to solve the class imbalance problem in datasets used in the software fault prediction discipline.
  5. Building the Robust novel Software Fault Prediction model using machine learning techniques
  6. Studying the effect and performance of transfer learning in the software fault prediction discipline
  7. Studying the possibilities of automatic feature learning of source code using deep learning techniques and its impact on fault prediction: Fault prediction based on deep learning
  8. Introducing Novel fault-driven software development life cycle using federated learning.
  9. Prediction and analysis of flaky test
  1. https://onlinelibrary.wiley.com/doi/10.1002/cpe.6828
  2. https://link.springer.com/article/10.1007/s12652-021-03429-w
  3. https://www.sciencedirect.com/science/article/pii/S0263224118311400
  4. http://www.inass.org/2019/2019022805.pdf

Publication Date

Spring 10-1-2022