Abstract: Heart disease represents a significant health challenge worldwide, requiring efficient classification and prediction approaches for prompt diagnosis and intervention. Reliable prediction models depend on high-quality data, which calls for thorough preprocessing, particularly in areas such as imputation, outlier elimination, and feature selection to improve dataset quality and model effectiveness. Current preprocessing methods, which include managing missing data, identifying outliers, and selecting features, frequently show limitations and biases when dealing with complex datasets. In order to address these issues, this paper introduces the Data Preprocessing algorithm. This algorithm employs ensemble-based approaches to thoroughly tackle these challenges, improving dataset quality and elevating feature significance. The algorithm incorporates data merging techniques to unify various datasets and maintain uniformity. It makes use of the EMERALD algorithm for effective imputation of missing data and the RACE algorithm for successful outlier elimination. Normalization methods like Min-Max scaling are applied to standardize the data, while the DYNAMIC algorithm identifies key features essential for predictive performance.
Keywords: Data preparation, Missing data handling, Outlier elimination, Variable selection, Ensemble methods.
Download:
|
DOI:
10.17148/IMRJR.2025.020706
[1] MERLIN SOFIA S, Dr. D. RAVINDRAN, DR. G. AROCKIA SAHAYA SHEELA, "Data Preparation Strategies for Improving the Performance of Machine Learning Models in Heart Disease Prediction," International Multidisciplinary Research Journal Reviews (IMRJR), 2025, DOI 10.17148/IMRJR.2025.020706