Evaluasi Pengaruh Kualitas Data Terhadap Performa Model Machine Learning Menggunakan Pendekatan Data-Centric AI

Authors

  • Bisma Mahendra STMIK IKMI Cirebon
  • Martanto STMIK IKMI Cirebon
  • Denni Pratama STMIK IKMI Cirebon
  • Ahmad Faqih STMIK IKMI Cirebon
  • Rudi Kurniawan STMIK IKMI Cirebon

DOI:

https://doi.org/10.56995/sintek.v6i1.211

Keywords:

Data-Centric AI, Machine Learning, Kualitas Data, Random Forest, Support Vector Machine, Nilai Hilang

Abstract

Penelitian ini mengevaluasi pengaruh kualitas data terhadap performa model machine learning menggunakan pendekatan Data-Centric Artificial Intelligence (DCAI). Eksperimen dilakukan pada Titanic Dataset dengan membandingkan Random Forest dan Support Vector Machine (SVM) dalam tiga skenario penanganan missing values, yaitu Drop Missing, Mean Imputation, dan No Imputation. Kinerja model dievaluasi menggunakan metrik Accuracy, F1 Score, dan Area Under Curve (AUC). Hasil menunjukkan bahwa intervensi kualitas data memberikan dampak signifikan terhadap performa model. Random Forest mencapai performa terbaik pada skenario Drop Missing dengan Accuracy 0.813, F1-Score 0.758, dan AUC 0.859, sedangkan SVM memperoleh Accuracy tertinggi sebesar 0.822 pada skenario Mean Imputation. Uji statistik Paired t-Test menunjukkan tidak terdapat perbedaan performa yang signifikan secara statistik antara kedua model (p-value > 0.05). Temuan ini menegaskan bahwa peningkatan kualitas data lebih berpengaruh terhadap kinerja model dibandingkan pemilihan algoritma, sehingga mendukung paradigma Data-Centric AI.

Downloads

Download data is not yet available.

References

D. Zha et al., “Data-Centric Artificial Intelligence: A Survey,” J. Intell. Inf. Syst., vol. 62, pp. 1493–1502, 2023, doi: 10.1007/s10844-024-00901-9.

P. Krutz, M. Rehm, H. Schlegel, and M. Dix, “Recognition of Sports Exercises Using Inertial Sensor Technology,” Appl. Comput. Sci., vol. 19, no. 1, pp. 152–163, 2023, doi: 10.35784/acs-2023-10.

P. J. Hart et al., “Application of Big Data Analytics and Machine Learning to Large-Scale Synchrophasor Datasets: Evaluation of Dataset ‘Machine Learning-Readiness,’” Ieee Open Access J. Power Energy, vol. 9, pp. 386–397, 2022, doi: 10.1109/oajpe.2022.3197553.

M. Rodriguez-Marin and L. G. Orozco-Alatorre, “Advancing Pediatric Growth Assessment With Machine Learning: Overcoming Challenges in Early Diagnosis and Monitoring,” Children, vol. 12, no. 3, p. 317, 2025, doi: 10.3390/children12030317.

S. Borrohou, R. Fissoune, and H. Badir, “Data Cleaning Survey and Challenges – Improving Outlier Detection Algorithm in Machine Learning,” J. Smart Cities Soc., vol. 2, no. 3, pp. 125–140, 2023, doi: 10.3233/scs-230008.

M. Tarik, A. Mniai, and K. Jebari, “Hybrid Feature Selection and Support Vector Machine Framework for Predicting Maintenance Failures,” Appl. Comput. Sci., vol. 19, no. 2, pp. 112–124, 2023, doi: 10.35784/acs-2023-18.

Y. Luo, “Evaluating the State of the Art in Missing Data Imputation for Clinical Data,” Brief. Bioinform., vol. 23, no. 1, 2021, doi: 10.1093/bib/bbab489.

N. R. Thompson, B. Lapin, and I. Katzan, “Estimating Change in Health-Related Quality of Life Before and After Stroke: Challenges and Possible Solutions,” Med. Decis. Mak., vol. 44, no. 8, pp. 961–973, 2024, doi: 10.1177/0272989x241285038.

B. O. Petrazzini, H. Naya, F. López-Bello, G. E. Vázquez, and L. Spangenberg, “Evaluation of Different Approaches for Missing Data Imputation on Features Associated to Genomic Data,” Biodata Min., vol. 14, no. 1, 2021, doi: 10.1186/s13040-021-00274-7.

O. O. Petinrin, F. Saeed, N. Salim, M. Toseef, Z. Liu, and I. O. Muyide, “Dimension Reduction and Classifier-Based Feature Selection for Oversampled Gene Expression Data and Cancer Classification,” Processes, vol. 11, no. 7, p. 1940, 2023, doi: 10.3390/pr11071940.

D. K. Marangu, S. Njenga, and R. N. Ndung’u, “Systematic Review of Models Usedto Handle Class Imbalance in Anomaly Detection for Energy Consumption,” Int. J. Artif. Intell. Appl., vol. 15, no. 3, pp. 41–52, 2024, doi: 10.5121/ijaia.2024.15304.

N. A. S. A. Sabri, H. Hamed, M. A. M. Isa, N. S. Ghazali, and Z. Ibrahim, “Low-Density Polyethylene (LDPE) Food Packaging Defect Classification Using Local Binary Pattern (LBP),” J. Phys. Conf. Ser., vol. 2129, no. 1, p. 12052, 2021, doi: 10.1088/1742-6596/2129/1/012052.

Downloads

Published

2026-01-21

How to Cite

Bisma Mahendra, Martanto, Denni Pratama, Ahmad Faqih, & Rudi Kurniawan. (2026). Evaluasi Pengaruh Kualitas Data Terhadap Performa Model Machine Learning Menggunakan Pendekatan Data-Centric AI. Jurnal Sistem Informasi Dan Teknologi (SINTEK), 6(1), 107–113. https://doi.org/10.56995/sintek.v6i1.211