Bias-Aware Machine Learning for Student Dropout Prediction: Balancing Accuracy and Fairness
DOI:
https://doi.org/10.70112/ajeat-2025.14.2.4330Keywords:
Student Dropout, Machine Learning, Fairness, Bias Mitigation, Early Warning SystemsAbstract
Student dropout remains a persistent global challenge with serious social and economic consequences. Early identification of at-risk learners enables timely support, which can improve retention while promoting fairness in educational outcomes. This study presents a bias-aware machine learning framework for student dropout prediction that jointly evaluates predictive performance and fairness across demographic subgroups. Six machine learning models are benchmarked using academic, demographic, and socioeconomic features. Model performance is assessed using Accuracy, F1-score, Precision, and Matthews Correlation Coefficient, while fairness is evaluated across gender, marital status, and displacement groups. Initial results show that CatBoost achieves the strongest overall performance before class balancing; however, subgroup analysis reveals systematic disparities affecting vulnerable populations. To address these biases, the Synthetic Minority Oversampling Technique is applied. After rebalancing, XGBoost delivers the best performance, achieving substantial improvements in predictive accuracy alongside marked reductions in subgroup disparities. In particular, dropout detection for displaced students improves significantly, narrowing fairness gaps across all evaluated groups. The findings demonstrate that data-level bias mitigation can enhance both accuracy and equity in educational predictive systems. This work provides empirical evidence that fairness-aware machine learning can support more reliable and inclusive early warning systems for student retention.
References
[1] E. Kučak, M. Peršić, and N. Vučković, “Predictive modeling in education: Machine learning applications for student performance and retention,” Educ. Inf. Technol., vol. 28, pp. 155–172, 2023.
[2] M. Costa-Mendes, J. Sousa, and C. Lopes, “Fairness and interpretability in educational data mining: A case study on student performance prediction,” Comput. Educ.: Artif. Intell., vol. 3, 2022, doi: 10.1016/j.caeai.2022.100094.
[3] P. Raftopoulos, G. Papadopoulos, and A. Tefas, “Mitigating algorithmic bias through data resampling and reweighting strategies,” Expert Syst. Appl., vol. 210, 2023, doi: 10.1016/j.eswa.2022.118351.
[4] M. Pham, K. Nguyen, and T. Tran, “FAIREDU: Fair regression for education data using bias-aware optimization,” IEEE Access, vol. 12, pp. 98561–98574, 2024, doi: 10.1109/ACCESS.2024.3459821.
[5] B. H. Zhang, B. Lemoine, and M. Mitchell, “Mitigating unwanted biases with adversarial learning,” in Proc. AAAI/ACM Conf. AI, Ethics, and Society (AIES), pp. 335–340, 2018, doi: 10.1145/3278721.3278779.
[6] F. Kamiran and T. Calders, “Data preprocessing techniques for classification without discrimination,” Knowl. Inf. Syst., vol. 33, no. 1, pp. 1–33, 2012, doi: 10.1007/s10115-011-0463-8.
[7] I. Kusner, J. Loftus, C. Russell, and R. Silva, “Counterfactual fairness,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), pp. 4066–4076, 2017.
[8] S. Madras, E. Creager, T. Pitassi, and R. Zemel, “Fairness through causal awareness: Applications in educational analytics,” arXiv preprint arXiv:2002.08570, 2020.
[9] V. Realinho, J. Machado, L. Baptista, and M. V. Martins, “Predicting student dropout and academic success,” Data, vol. 7, no. 11, p. 146, 2022, doi: 10.3390/data7110146.
[10] C. Schaffer, “Selecting a classification method by cross-validation,” Mach. Learn., vol. 13, pp. 135–143, 1993, doi: 10.1007/BF00993106.
[11] A. Fernández, S. García, F. Herrera, and N. V. Chawla, “SMOTE for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary,” J. Artif. Intell. Res., vol. 61, pp. 863–905, 2018, doi: 10.1613/jair.1.11192.
[12] T. Fawcett, “An introduction to ROC analysis,” Pattern Recognit. Lett., vol. 27, no. 8, pp. 861–874, 2006, doi: 10.1016/j.patrec.2005.10.010.
[13] D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,” BMC Genomics, vol. 21, no. 6, pp. 1–13, 2020, doi: 10.1186/s12864-019-6413-7.
[14] B. J. Baldi, S. Brunak, Y. Chauvin, C. A. Andersen, and H. Nielsen, “Assessing the accuracy of prediction algorithms for classification: An overview,” Bioinformatics, vol. 16, no. 5, pp. 412–424, 2000, doi: 10.1093/bioinformatics/16.5.412.
[15] M. Hardt, E. Price, and N. Srebro, “Equality of opportunity in supervised learning,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 29, pp. 3315–3323, 2016.
[16] A. Chouldechova, “Fair prediction with disparate impact: A study of bias in recidivism prediction instruments,” Big Data, vol. 5, no. 2, pp. 153–163, 2017, doi: 10.1089/big.2016.0047.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Centre for Research and Innovation

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

