Prediction of University-Level Academic Performance through Machine Learning Mechanisms and Supervised Methods

Leonardo Emiro Contreras Bravo; Nayibe Nieves-Pimiento; Karolina Gonzalez-Guerrero

doi:10.14483/23448393.19514

Authors

Leonardo Emiro Contreras Bravo Universidad Distrital Francisco José de Caldas
Nayibe Nieves-Pimiento Universidad ECCI
Karolina Gonzalez-Guerrero Universidad Militar Nueva Granada (Bogotá, Colombia)

Keywords:

análisis de datos educativos, Machine Learning, educación superior (es).

Keywords:

educational data analysis, Machine Learning, higher education (en).

Downloads

Full Text XML Authors Available Metrics References How to Cite

Author Biographies

Nayibe Nieves-Pimiento, Universidad ECCI

Nayive Nieves Pimiento, Ingeniera mecánica, Máster en Ciencias Ambientales, Facultad de ingeniería, Ingeniería Ambiental, Grupo de investigación Gestión ambiental y desarrollo sostenible, Universidad ECCI, Bogotá, Colombia. Correo electrónico: nnievesp@ecci.edu.co

Karolina Gonzalez-Guerrero, Universidad Militar Nueva Granada (Bogotá, Colombia)

Karolina Gonzalez Guerrero, Licenciada en Educación, Magister en educación, Doctora en Educación, Grupo de investigación PYDE, Gestión ambiental y desarrollo sostenible. Docente de la Universidad Militar Nueva Granada. Correo electrónico: karolina.gonzalez@unimilitar.edu.co

References

M. Ferreyra, J. Botero, P. Haimovich, and S. Urzúa, “Momento decisivo La educación superior en América Latina y el Caribe,” Washington, 2017. [Online]. Available: https://openknowledge.worldbank.org/bitstream/handle/10986/26489/211014ovSP.pdf

E. J. de La Hoz, E. J. de La Hoz, and T. J. Fontalvo, “Methodology of Machine Learning for the classification and prediction of users in virtual education environments,” Inf. Tecnol., vol. 30, no. 1, pp. 247-254, Feb. 2019. https://doi.org/10.4067/S0718-07642019000100247 DOI: https://doi.org/10.4067/S0718-07642019000100247

Ministerio de Educación, “Sistema nacional de información de la educación superior,” 2019. [Online]. Available: https://snies.mineducacion.gov.co/portal/

I. A. Khan and J. T. Choi, “An application of educational data mining (EDM) technique for scholarship prediction,” Int. J. Softw. Eng. Its Appl., vol. 8, no. 12, pp. 31-42, 2014. https://doi.org/10.14257/ijseia.2014.8.12.03

H. Lamas, “Sobre el rendimiento escolar,” Prósitos y Represent. Rev. Psicol. Educ., vol. 3, no. 1, pp. 313-386, 2015. https://doi.org/10.20511/pyr2015.v3n1.74 DOI: https://doi.org/10.20511/pyr2015.v3n1.74

J. Espinosa, J. Hernández, J. Rodríguez, M. Chacín, and V. Bermúdez, “Influencia del estrés sobre el rendimiento académico,” AVFT-Archivos Venez. Farmacol. y Ter., vol. 39, no. 1, 2020. https://doi.org/10.5281/zenodo.4065032

M. G. Jiménez, J. A. I.- Psicothema, and 2000, “La predicción del rendimiento académico: regresión lineal versus regresión logística,” Psicothema, vol. 12, pp. 222-248, 2000. https://www.psicothema.com/pdf/558.pdf

Garbanzo and G. María, “Factores asociados al rendimiento académico en estudiantes universitarios, una reflexión desde la calidad de la educación superior pública,” Rev. Educ., vol. 31, no. 1, pp. 43-63, 2007. https://www.redalyc.org/articulo.oa?id=44031103 DOI: https://doi.org/10.15517/revedu.v31i1.1252

L. Rojas, “Validez predictiva de los componentes del promedio de Admisión a la universidad de costa rica utilizando el Género y el tipo de colegio como variables control,” Rev. Elec. Actual. Investig. en Educ., vol. 13, no. 1, pp. 17-25, Jan. 2013. https://revistas.ucr.ac.cr/index.php/aie/article/view/11707/18183

D. García, J. Manuel, and M. Pichardo, “Learning analytics as an analysis factor of university academic performance,” in CEUR Workshop Proceedings, 2019, pp. 42-50. http://ceur-ws.org/Vol-2231/LALA_2018_paper_14.pdf

J. Huamán, “Evaluación del rendimiento académico estudiantil de la cohorte 2011-2015, según áreas de la carrera de estomatología Universidad Peruana Cayetano Heredia”. Título de Cirujano Dentista, Departamento Académico de Odontología Social, Universidad Peruana Cayetano Heredia, 2018. [Online]. Available: https://repositorio.upch.edu.pe/handle/20.500.12866/1429

D. A. Montoya-Arenas, E. M. Bustamante-Zapata, C. M. Díaz-Soto, and D. Pineda, “Factores de la capacidad intelectual y de la función ejecutiva relacionados con el rendimiento académico en estudiantes universitarios,” Rev. la Esc. Cienc. Salud Univ. Pontif. Boliv., vol. 40, no. 1, pp. 10-18, 2021. https://doi.org/10.18566/medupb.v40n1.a03 DOI: https://doi.org/10.18566/medupb.v40n1.a03

L. Contreras, J. Rodríguez, and H. Fuentes, “Analítica académica: nuevas herramientas aplicadas a la educación,” Rev. Boletín Redipe, vol. 10, no. 3, pp. 137-158, 2021.

P. Murnion and M. Helfert, “Academic analytics in quality assurance using organisational analytical capabilities,” in Annual Conf. UK Acad. Info. Sys. (UKAIS), 2013. [Online]. Availavle: https://doi.org/10.13140/2.1.3368.1600

G. Hackeling, Mastering machine learning with scikit-learn: Learn to implement and evaluate machine learning solutions with scikit-learn, 2nd ed., vol. 1., Bigmingham, UK: Packt Publishing Ltd., 2014.

L. Contreras, H. Fuentes, and J. Rodríguez, “Predicción del rendimiento académico como indicador de éxito/fracaso de los estudiantes de ingeniería, mediante aprendizaje automático,” Form. Univ., vol. 13, no. 5, pp. 233-246, 2020. https://doi.org/10.4067/S0718-50062020000500233

T. C. Hakyemez and S. Mardikyan, “The interplay between institutional integration and self-efficacy in the academic performance of first-year university students: A multigroup approach,” Int. J. Manag. Educ., vol. 19, no. 1, 2021. https://doi.org/10.1016/j.ijme.2020.100430 DOI: https://doi.org/10.1016/j.ijme.2020.100430

G. Guizado, M. Valenzuela, and P. Vallejo, “Desempeño docente y el rendimiento académico de los estudiantes de la Facultad de Tecnología en la Universidad Nacional de Educación de Perú,” Rev. Conrado, vol. 16, no. 72, 200-203, 2020. https://conrado.ucf.edu.cu/index.php/conrado/article/view/1231

E. Zárate, B. Lavado, and W. Pomahuacre, “Competecia comunicativa intercultural y rendimiento académico en lenguas extranjeras,” Rev. Conrado, vol. 16, no. 74, 30-37, 2020. https://conrado.ucf.edu.cu/index.php/conrado/article/view/1330

T. Icekson, O. Kaplan, and O. Slobodin, “Does optimism predict academic performance? Exploring the moderating roles of conscientiousness and gender,” Stud. High. Educ., vol. 45, no. 3, pp. 635-647, 2020. https://doi.org/10.1080/03075079.2018.1564257

A. M. Pavelea and O. Moldovan, “Why some fail and others succeed: Explaining the academic performance of PA undergraduate students,” NISPAcee J. Public Adm. Policy, vol. 13, no. 1, pp. 109-132, 2020. https://doi.org/10.2478/nispa-2020-0005 DOI: https://doi.org/10.2478/nispa-2020-0005

H. Vargas, L. Solórzano, and W. Chanini, “Modelo matemático entre el puntaje de examen de ingreso y el rendimiento académico de los estudiantes ingresantes a la Universidad Nacional Jorge Basadre Grohmann, año académico 2018,” Ciencias, vol. 3, no. 3, 45-51, 2019. https://doi.org/10.33326/27066320.2019.3.949 DOI: https://doi.org/10.33326/27066320.2019.3.949

A. Lenskiy, R. Shariat, and S. Seol, “The effect of academic breaks on undergraduate academic performance,” Int. J. Electr. Eng. Educ., 2020. [Online]. Available: https://doi.org/10.1177/0020720920922518 DOI: https://doi.org/10.1177/0020720920922518

M. Oladejo, “A path-analytic study of socio-psychological variables and academic performance of distance learners in nigerian universities,” Doctoral thesis, Univ. Lagos, 2010. [Online]. Available: https://doi.org/10.13140/RG.2.2.19443.73762

M. Kotzé; Niemann, “Psychological resources as predictors of academic performance of first-year students in higher education,” Acta académica., vol. 45, no. 2, pp. 85-121, 2013. https://journals.ufs.ac.za/index.php/aa/article/view/1399

E. Alyahyan and D. Düştegör, “Predicting academic success in higher education: Literature review and best practices,” Int. J. Educ. Technol. High. Educ., vol. 17, no. 1, pp. 1-21, Dec. 2020. https://doi.org/10.1186/S41239-020-0177-7/TABLES/15 DOI: https://doi.org/10.1186/s41239-020-0177-7

G. Tarazona, L. Contreras, and H. Fuentes, “Machine Learning variables and algorithms that influence academic performance: A review,” Int. J. Mech. Prod. Eng. Res. Dev., vol. 10, no. 3, pp. 16011-16028, 2020. http://www.tjprc.org/view_paper.php?id=14467

L. Contreras, H. Fuentes, and J. Rodríguez, “Academic Interruption Model using Automatic Learning Algorithms” Sylwan J., vol. 10, no. 3, pp 16075-16086 ,2020. http://www.tjprc.org/view_paper.php?id=14480

L. Contreras, H. Fuentes, and J. Molano, “Analítica académica: nuevas herramientas aplicadas a la educación,” Rev. Bol. Redipe, vol. 10, no. 3, pp. 137-158, 2021. https://doi.org/10.36260/rbr.v10i3.1225 DOI: https://doi.org/10.36260/rbr.v10i3.1225

A. Rico, N. Gaytán, and D. Sánchez, “Construcción e implementación de un modelo para predecir el rendimiento académico de estudiantes universitarios mediante el algoritmo Naïve Bayes,” Diálogos sobre Educ., vol. 19, art. 509, 2019. https://doi.org/10.32870/dse.v0i19.509 DOI: https://doi.org/10.32870/dse.v0i19.509

Y. Widyaningsih, N. Fitriani, and D. Sarwinda, “A semi-supervised learning approach for predicting student's performance: First-year,” 2019 12th International Conference on Information & Communication Technology and System (ICTS), pp. 291–295, 2019. https://doi.org/10.1109/ICTS.2019.8850950 DOI: https://doi.org/10.1109/ICTS.2019.8850950

F. Otálora, “Modelo para la identificación de patrones de desempeño académico estudiantil para fortalecer el acompañamiento académico en la Universidad Nacional de Colombia,” MSc. dissertation, Dept. Elect. Eng., Universidad Nacional de Colombia, 2019. [Online]. Available: https://repositorio.unal.edu.co/handle/unal/77758.

R. Istvan and V. Lasagna, “Sistema informático para la detección temprana de deserción estudiantil universitaria,” Innovación y Desarro. Tecnológico y Soc., vol. 1, no. 2, pp. 1-15, 2019. https://doi.org/10.24215/26838559e006 DOI: https://doi.org/10.24215/26838559e006

S. S. M. Ajibade, N. Bahiah Binti Ahmad, and S. Mariyam Shamsuddin, “Educational data mining: Enhancement of

student performance model using ensemble methods,” IOP Conf. Ser. Mater. Sci. Eng., vol. 551, no. 1, art. 012061, 2019. https://doi.org/10.1088/1757-899X/551/1/012061 DOI: https://doi.org/10.1088/1757-899X/551/1/012061

C. Jalota and R. Agrawal, “Analysis of educational data mining using classification,” in Proc. Int. Conf. Mach. Learn. Big Data, Cloud Parallel Comput. Trends, Prespectives Prospect. Com. 2019, 2019, pp. 243-247. https://doi.org/10.1109/COMITCon.2019.8862214 DOI: https://doi.org/10.1109/COMITCon.2019.8862214

O. Castrillón, W. Sarache, and S. Ruiz, “Predicción del rendimiento académico por medio de técnicas de inteligencia artificial,” Rev. Form. Univ., vol. 13, no. 1, pp. 93-102, 2020. https://doi.org/10.4067/S0718-50062020000100093 DOI: https://doi.org/10.4067/S0718-50062020000100093

A. Das and E. Rodríguez, “A predictive analytics system for forecasting student academic performance: Insights from a pilot project at eastern Washington university,” 2019 Jt. 8th Int. Conf. Informatics, Electron. Vision, ICIEV, 2019, pp. 255-262. https://doi.org/10.1109/ICIEV.2019.8858523

I. Burman and S. Som, “Predicting Students Academic Performance Using Support Vector Machine,” in Proc. 2019 Amity Int. Conf. Artif. Int., AICAI 2019, Apr. 2019, pp. 756-759. https://doi.org/10.1109/AICAI.2019.8701260 DOI: https://doi.org/10.1109/AICAI.2019.8701260

M. V. Amazona and A. A. Hernández, “Modelling student performance using data mining techniques,” in Proc. 2019 5th Int. Conf. Comp. Data Eng., ICCDE’ 19, May 2019, pp. 36-40. https://doi.org/10.1145/3330530.3330544 DOI: https://doi.org/10.1145/3330530.3330544

A. I. Adekitan and E. Noma-Osaghae, “Data mining approach to predicting the performance of first year student in a university using the admission requirements,” Educ. Inf. Technol., vol. 24, no. 2, pp. 1527-1543, 2019. https://doi.org/10.1007/s10639-018-9839-7 DOI: https://doi.org/10.1007/s10639-018-9839-7

M. Hussain, W. Zhu, W. Zhang, S. M. R. Abidi, and S. Ali, “Using machine learning to predict student difficulties from learning session data,” Artif. Intell. Rev., vol. 52, no. 1, pp. 381-407, 2019. https://doi.org/10.1007/s10462-018-9620-8 DOI: https://doi.org/10.1007/s10462-018-9620-8

X. Xu, J. Wang, H. Peng, and R. Wu, “Prediction of academic performance associated with internet usage behaviors using machine learning algorithms,” Comput. Human Behav., vol. 98, pp. 166-173, Apr. 2019. https://doi.org/10.1016/j.chb.2019.04.015 DOI: https://doi.org/10.1016/j.chb.2019.04.015

Bendangnuksung, “Students’ performance prediction using deep neural network,” Int. J. Appl. Eng. Res., vol. 13, no. 02, pp. 1171-1176, 2018. https://www.ripublication.com/ijaer18/ijaerv13n2_46.pdf

Y. Nieto, V. García-Díaz, C. Montenegro, and R. G. Crespo, “Supporting academic decision making at higher educational institutions using machine learning-based algorithms,” Soft Comput., vol. 23, no. 12, pp. 4145-4153, 2018. https://doi.org/10.1007/s00500-018-3064-6 DOI: https://doi.org/10.1007/s00500-018-3064-6

L. Wang and Y. Yuan, “A prediction strategy for academic records based on classification algorithm in online learning environment,” Proc. - IEEE 19th Int. Conf. Adv. Learn. Technol. ICALT 2019, vol. 2161-377X, pp. 1-5, 2019. https://doi.org/10.1109/ICALT.2019.00007 DOI: https://doi.org/10.1109/ICALT.2019.00007

Y. K. Salal, S. M. Abdullaev, and M. Kumar, “Educational data mining: Student performance prediction in academic,” Int. J. Eng. Adv. Technol., vol. 8, no. 4C, pp. 54-59, 2019. https://www.semanticscholar.org/paper/Educational-Data-Mining-%3A-Student-Performance-in-Salal-Abdullaev/b21fa7245581c3baad2d468cb9d706940de7e010

S. Hirokawa, “Key attribute for predicting student academic performance,” in ICETC '18: 10th Int. Conf. Ed. Tech. Comp, 2018, pp. 308-313. https://doi.org/10.1145/3290511.3290576 DOI: https://doi.org/10.1145/3290511.3290576

A. B. Nassif, I. Shahin, I. Attili, M. Azzeh, and K. Shaalan, “Speech recognition using deep neural networks: A systematic review,” IEEE Access, vol. 7, pp. 19143-19165, 2019. https://doi.org/10.1109/ACCESS.2019.2896880 DOI: https://doi.org/10.1109/ACCESS.2019.2896880

J. Sotomonte, C. Rodríguez, C. Montenegro, P. Gaona, and J. Castellanos, “Hacia la construcción de un modelo predictivo de deserción académica basado en técnicas de minería de datos,” Rev. Científica, vol. 3, no. 26, p. 35, 2016. https://doi.org/10.14483/23448350.11089 DOI: https://doi.org/10.14483/23448350.11089

M. Alloghani, D. Al-Jumeily, A. Hussain, A. J. Aljaaf, J. Mustafina, and E. Petrov, “Application of machine learning on student data for the appraisal of academic performance,” Proc. - Int. Conf. Dev. eSystems Eng. DeSE, vol. 2018, pp. 157-162, Sep. 2019. https://doi.org/10.1109/DeSE.2018.00038 DOI: https://doi.org/10.1109/DeSE.2018.00038

M. Mohammadi, M. Dawodi, W. Tomohisa, and N. Ahmadi, “Comparative study of supervised learning algorithms for student performance prediction,” in 1st Int. Conf. Artif. Intell. Inf. Commun. ICAIIC 2019, 2019, pp. 124-127. https://doi.org/ 10.1109/ICAIIC.2019.8669085 DOI: https://doi.org/10.1109/ICAIIC.2019.8669085

H. Anderson, B. Afshan, and R. Baker, “Predicting Graduation at a Public R1 University,” 2019. [Online]. Available: https://learninganalytics.upenn.edu/ryanbaker/paper323.pdf

J. Hou and Y. Wen, “Prediction of learners’ academic performance using factorization machine and decision tree,” in 2019 IEEE Int. Congr. Cybermatics, 2019, pp. 1-8. https://doi.org/10.1109/iThings/GreenCom/CPSCom/SmartData.2019.00024 DOI: https://doi.org/10.1109/iThings/GreenCom/CPSCom/SmartData.2019.00024

Y. S. Alsalman, N. Khamees Abu Halemah, E. S. Alnagi, and W. Salameh, “Using decision tree and artificial neural network to predict students academic performance,” in 2019 10th Int. Conf. Inf. Commun. Syst. ICICS 2019, 2019, pp. 104-109. https://doi.org/ 10.1109/IACS.2019.8809106 DOI: https://doi.org/10.1109/IACS.2019.8809106

T. Icekson, O. Kaplan, and O. Slobodin, “Does optimism predict academic performance? Exploring the moderating roles of conscientiousness and gender,” Stud. High. Educ., vol. 45, no. 3, pp. 635-647, Mar. 2020. https://doi.org/10.1080/03075079.2018.1564257 DOI: https://doi.org/10.1080/03075079.2018.1564257

R. C. Céspedes, A. Vara-Horna, D. López-Odar, I. Santi-Huaranca, A. Díaz-Rosillo, and Z. Asencios-González, “Ausentismo, presentismo y rendimiento académico en estudiantes de universidades peruanas,” Rev. Psicol. Educ., vol. 6, no. 1, pp. 83-133, Jan. 2018. https://doi.org/10.20511/PYR2018.V6N1.177 DOI: https://doi.org/10.20511/pyr2018.v6n1.177

P. Luján, L. Trelles, and M. Mogollón, “Asertividad y rendimiento académico en estudiantes de la facultad de ciencias administrativas de la Universidad Nacional de Piura,” UCV - Sci., vol. 11, no. 1, 13-20, 2019. https://revistas.ucv.edu.pe/index.php/ucv-scientia/article/view/1170 DOI: https://doi.org/10.18050/ucv-scientia.v11i1.2397

Y.-W. Liang, D. Jones, and R. A. Robles-Pina, “Ethnic and gender stereotypes on college students’ academic performance,” Res. High. Educ. J., vol. 35, 2018. https://www.aabri.com/manuscripts/182858.pdf

C. Durán and A. Rosado, “La comprensión lectora y el rendimiento académico en estudiantes de ingeniería,” Rev. Colomb. Tecnol. Av., vol. 1, no. 33, pp. 9-15, Mar. 2019, https://doi.org/10.24054/16927257.V33.N33.2019.3317 DOI: https://doi.org/10.24054/16927257.v33.n33.2019.3317

B. Kitchenham, O. Pearl Brereton, D. Budgen, M. Turner, J. Bailey, and S. Linkman, “Systematic literature reviews in software engineering – A systematic literature review,” Inf. Softw. Technol., vol. 51, no. 1, pp. 7-15, Jan. 2009. https://doi.org/ 10.1016/j.infsof.2008.09.009. DOI: https://doi.org/10.1016/j.infsof.2008.09.009

K. Gonzalez, J. Rodríguez, and L. Contreras, “Academic performance and alternatives with prediction- oriented machine learning: A review of the state of the art,” Int. J. Mech. Prod. Eng. Res. Dev., vol. 10, no. 3, pp. 16329-16340, 2020. http://www.tjprc.org/view_paper.php?id=14520

K. C. Santosh, “AI-driven tools for coronavirus outbreak: Need of active learning and cross-population train/test models on multitudinal/multimodal data,” J. Med. Syst., vol. 44, no. 5, pp. 1-5, May 2020. https://doi.org/10.1007/s10916-020-01562-1 DOI: https://doi.org/10.1007/s10916-020-01562-1

J. García, P. Sánchez, M. Orozco, and S. Obredor, “Extracción de conocimiento para la predicción y análisis de los resultados de la prueba de calidad de la educación superior en Colombia,” Rev. Form. Univ., vol. 12, no. 4, pp. 55-62, 2019. https://doi.org/ 10.4067/S0718-50062019000400055 DOI: https://doi.org/10.4067/S0718-50062019000400055

M. Zaffar, M. A. Hashmani, K. S. Savita, and S. S. H. Rizvi, “A study of feature selection algorithms for predicting students' academic performance,” Int. J. Adv. Comput. Sci. Appl., vol. 9, no. 5, pp. 541-549, 2018. https://doi.org/10.14569/IJACSA.2018.090569 DOI: https://doi.org/10.14569/IJACSA.2018.090569

A. K. Das and E. Rodriguez-Marek, “A Predictive Analytics System for Forecasting Student Academic Performance: Insights from a Pilot Project at Eastern Washington University,” in 2019 Joint 8th Int. Conf. Informatics Elec. Vision (ICIEV) and 2019 3rd Int. Conf. Imaging, 2019, pp. 255-262. https://doi.org/10.1109/ICIEV.2019.8858523 DOI: https://doi.org/10.1109/ICIEV.2019.8858523

V. L. Uskov, J. P. Bakken, A. Byerly, and A. Shah, “Machine Learning-based predictive analytics of student academic performance in STEM education,” in 2019 IEEE Global Eng. Educ. Conf. (EDUCON), 2019, pp. 1370-1376. https://doi.org/10.1109/EDUCON.2019.8725237 DOI: https://doi.org/10.1109/EDUCON.2019.8725237

R. Asif, A. Merceron, S. A. Ali, and N. G. Haider, “Analyzing undergraduate students’ performance using educational data mining,” Comput. Educ., vol. 113, pp. 177-194, 2017. https://doi.org/10.1016/j.compedu.2017.05.007 DOI: https://doi.org/10.1016/j.compedu.2017.05.007

J. Horak, J. Vrbka, and P. Suler, “Support vector machine methods and artificial neural networks used for the development of bankruptcy prediction models and their comparison,” J. Risk Financ. Manag., vol. 13, no. 3, p. 80, Mar. 2020. https://doi.org/10.3390/JRFM13030060 DOI: https://doi.org/10.3390/jrfm13030060

F. Ofori, E. Maina, and R. Gitonga, “Using machine learning algorithms to predict students’ performance and improve learning outcome: A literature based review,” J. Inf. Technol., vol. 4, no. 1, pp. 33-55, 2020. https://ir-library.ku.ac.ke/handle/123456789/20243?show=full

J. Brownlee, “Machine Learning Mastery,” 2020. https://machinelearningmastery.com/ (accessed Dec. 21, 2020).

F. J. Kaunang and R. Rotikan, “Students’ academic performance prediction using data mining,” in 3rd Int. Conf. Informatics Comput. ICIC 2018, 2018, pp. 1-5. https://doi.org/10.1109/IAC.2018.8780547 DOI: https://doi.org/10.1109/IAC.2018.8780547

Pandas.org, “pandas.DataFrame.transform,” 2021. https://pandas.pydata.org/

R. M. Aguilar, J. M. Torres, and C. A. Martín, “Automatic learning for the system identification. A case study in the prediction of power generation in a wind farm,” RIAI - Rev. Iberoam. Autom. e Inform. Ind., vol. 16, no. 1, pp. 114-127, 2019. https://doi.org/10.4995/riai.2018.9421 DOI: https://doi.org/10.4995/riai.2018.9421

L. E. Contreras, H. J. Fuentes, and J. I. Rodríguez, “Predicción del rendimiento académico como indicador de éxito/fracaso de los estudiantes de ingeniería, mediante aprendizaje automático,” Form. Univ., vol. 13, no. 5, pp. 233-246, 2020. http://dx.doi.org/10.4067/S0718-50062020000500233. DOI: https://doi.org/10.4067/S0718-50062020000500233

H. Almarabeh, “Analysis of students’ performance by using different data mining classifiers,” Int. J. Mod. Educ. Comput. Sci., vol. 8, pp. 9-15, 2017. https://doi.org/10.5815/ijmecs.2017.08.02 DOI: https://doi.org/10.5815/ijmecs.2017.08.02

X. J. Lin et al., “Stress and its association with academic performance among dental undergraduate students in Fujian, China: A cross-sectional online questionnair survey,” BMC Med. Educ., vol. 20, art. 181, 2020. https://doi.org/10.1186/s12909-020-02095-4 DOI: https://doi.org/10.1186/s12909-020-02095-4

T. Deliens, P. Clarys, I. de Bourdeaudhuij, and B. Deforche, “Weight, socio-demographics, and health behaviour related correlates of academic performance in first year university students,” Nutr. J., vol. 12, art. 162, 2013. https://doi.org/10.1186/1475-2891-12-162 DOI: https://doi.org/10.1186/1475-2891-12-162

E. T. Ortlieb and E. H. Cheek, “How geographic location plays a role within instruction: Venturing into both rural and urban elementary schools,” Educ. Res. Q., vol. 31, no. 2, pp. 48-64, 2008. https://www.proquest.com/docview/215932925

J. Cresswell and C. Underwood, “Location, location, location: Implications of geographic situation on australian student performance in PISA 2000,” 2004. https://research.acer.edu.au/acer_monographs/2

A. Porto and L. Di Gresia, “Performance of University students and their determinants,” 2005. [Online]. Available: http://sedici.unlp.edu.ar/bitstream/handle/10915/54674/Documento_completo__.pdf-PDFA.pdf?sequence=1

R. Garzón, M. O. Rojas, L. Del Riesgo, M. Pinzón, and A. L. Salamanca, “Factores que pueden influir en el rendimiento académico de estudiantes de bioquímica que ingresan en el programa de medicina de la Universidad del Rosario-Colombia,” Educ. Médica, vol. 13, no. 2, pp. 85-96, 2010. https://scielo.isciii.es/scielo.php?script=sci_abstract&pid=S1575-18132010000200005 DOI: https://doi.org/10.33588/fem.132.561

E. Fernandes, M. Holanda, M. Victorino, V. Borges, R. Carvalho, and G. van Erven, “Educational data mining: Predictive analysis of academic performance of public school students in the capital of Brazil,” J. Bus. Res., vol. 94, no. 2018, pp. 335-343, Feb. 2019. https://doi.org/10.1016/j.jbusres.2018.02.012 DOI: https://doi.org/10.1016/j.jbusres.2018.02.012

A. Rico and D. Sánchez, “Diseño de un modelo para automatizar la predicción del rendimiento académico en estudiantes del IPN/Design of a model to automate the prediction of academic performance in students of IPN,” RIDE Rev. Iberoam. para la Investig. y el Desarro. Educ., vol. 8, no. 16, pp. 246-266, 2018. https://doi.org/10.23913/ride.v8i16.340 DOI: https://doi.org/10.23913/ride.v8i16.340

S. Bhutto, I. F. Siddiqui, Q. A. Arain, and M. Anwar, “Predicting students’ academic performance through supervised Machine Learning,” in ICISCT 2020 - 2nd Int. Conf. Inf. Sci. Commun. Technol., Feb. 2020. [Online]. Available: https://doi.org/10.1109/ICISCT49550.2020.9080033 DOI: https://doi.org/10.1109/ICISCT49550.2020.9080033

How to Cite

APA

Contreras Bravo, L. E., Nieves-Pimiento, N., and Gonzalez-Guerrero, K. (2022). Prediction of University-Level Academic Performance through Machine Learning Mechanisms and Supervised Methods. Ingeniería, 28(1), e19514. https://doi.org/10.14483/23448393.19514

ACM

[1]

Contreras Bravo, L.E. et al. 2022. Prediction of University-Level Academic Performance through Machine Learning Mechanisms and Supervised Methods. Ingeniería. 28, 1 (Nov. 2022), e19514. DOI:https://doi.org/10.14483/23448393.19514.

ACS

(1)

Contreras Bravo, L. E.; Nieves-Pimiento, N.; Gonzalez-Guerrero, K. Prediction of University-Level Academic Performance through Machine Learning Mechanisms and Supervised Methods. Ing. 2022, 28, e19514.

ABNT

CONTRERAS BRAVO, Leonardo Emiro; NIEVES-PIMIENTO, Nayibe; GONZALEZ-GUERRERO, Karolina. Prediction of University-Level Academic Performance through Machine Learning Mechanisms and Supervised Methods. Ingeniería, [S. l.], v. 28, n. 1, p. e19514, 2022. DOI: 10.14483/23448393.19514. Disponível em: https://revistas.udistrital.edu.co/index.php/reving/article/view/19514. Acesso em: 28 apr. 2025.

Chicago

Contreras Bravo, Leonardo Emiro, Nayibe Nieves-Pimiento, and Karolina Gonzalez-Guerrero. 2022. “Prediction of University-Level Academic Performance through Machine Learning Mechanisms and Supervised Methods”. Ingeniería 28 (1):e19514. https://doi.org/10.14483/23448393.19514.

Harvard

Contreras Bravo, L. E., Nieves-Pimiento, N. and Gonzalez-Guerrero, K. (2022) “Prediction of University-Level Academic Performance through Machine Learning Mechanisms and Supervised Methods”, Ingeniería, 28(1), p. e19514. doi: 10.14483/23448393.19514.

IEEE

[1]

L. E. Contreras Bravo, N. Nieves-Pimiento, and K. Gonzalez-Guerrero, “Prediction of University-Level Academic Performance through Machine Learning Mechanisms and Supervised Methods”, Ing., vol. 28, no. 1, p. e19514, Nov. 2022.

MLA

Contreras Bravo, Leonardo Emiro, et al. “Prediction of University-Level Academic Performance through Machine Learning Mechanisms and Supervised Methods”. Ingeniería, vol. 28, no. 1, Nov. 2022, p. e19514, doi:10.14483/23448393.19514.

Turabian

Contreras Bravo, Leonardo Emiro, Nayibe Nieves-Pimiento, and Karolina Gonzalez-Guerrero. “Prediction of University-Level Academic Performance through Machine Learning Mechanisms and Supervised Methods”. Ingeniería 28, no. 1 (November 20, 2022): e19514. Accessed April 28, 2025. https://revistas.udistrital.edu.co/index.php/reving/article/view/19514.

Vancouver

1.

Contreras Bravo LE, Nieves-Pimiento N, Gonzalez-Guerrero K. Prediction of University-Level Academic Performance through Machine Learning Mechanisms and Supervised Methods. Ing. [Internet]. 2022 Nov. 20 [cited 2025 Apr. 28];28(1):e19514. Available from: https://revistas.udistrital.edu.co/index.php/reving/article/view/19514

Download Citation

Visitas

1189

Dimensions

PlumX

Citations
Scopus - Citation Indexes: 4

Captures
Mendeley - Readers: 89

Downloads

Recibido: 29 de enero de 2022; Revisión recibida: 19 de julio de 2022; Aceptado: 5 de agosto de 2022

Abstract

Context:

In the education sector, variables have been identified which considerably affect students’ academic performance. In the last decade, research has been carried out from various fields such as psychology, statistics, and data analytics in order to predict academic performance.

Method:

Data analytics, especially through Machine Learning tools, allows predicting academic performance using supervised learning algorithms based on academic, demographic, and sociodemographic variables. In this work, the most influential variables in the course of students’ academic life are selected through wrapping, embedded, filter, and assembly methods, as well as the most important characteristics semester by semester using Machine Learning algorithms (Decision Trees, KNN, SVC, Naive Bayes, LDA), which were implemented using the Python language.

Results:

The results of the study show that the KNN is the model that best predicts academic performance for each of the semesters, followed by Decision Trees, with precision values that oscillate around 80 and 78,5% in some semesters.

Conclusions:

Regarding the variables, it cannot be said that a student’s per-semester academic average necessarily influences the prediction of academic performance for the next semester. The analysis of these results indicates that the prediction of academic performance using Machine Learning tools is a promising approach that can help improve students’ academic life allow institutions and teachers to take actions that contribute to the teaching-learning process.

Keywords:

educational data analysis, Machine Learning, higher education.

Resumen

Contexto:

En el sector educativo se han identificado variables que inciden considerablemente en el rendimiento academico de los estudiantes. En la ultima decada se han llevado a cabo investigaciones desde diversos campos como la psicologia, la estadistica y el analisis de datos con el fin de predecir el rendimiento academico.

Metodo:

La analitica de datos, especialmente a traves de herramientas de Machine Learning, permite predecir el rendimiento academico utilizando algoritmos de aprendizaje supervisado basados en variables academicas, demograficas y sociodemograficas. En este trabajo se seleccionan las variables mas influyentes en el transcurso de la vida academica de los estudiantes mediante metodos de filtro, embebidos, y de ensamble, asi como las caracteristicas mas importantes semestre a semestre utilizando algoritmos de Machine Learning (arbol de decision, KNN, SVC, Naive Bayes, LDA), implementados en el lenguaje Python.

Resultados:

Los resultados del estudio muestran que el KNN es el modelo que mejor predice el rendimiento academico para cada uno de los semestres, seguido de los arboles de decision, con valores de precision que oscilan alrededor del 80 y 78,5% en algunos semestres.

Conclusiones:

Con respecto a las variables, no se puede decir que el promedio academico semestral de un estudiante influya necesariamente en la prediccion del rendimiento academico del siguiente semestre. El analisis de estos resultados indica que la prediccion del rendimiento academico utilizando herramientas de Machine Learning es un enfoque promisorio que puede ayudar a mejorar la vida academica de los estudiantes y permitir a las instituciones y a los docentes adoptar acciones que ayuden al proceso de ensenanza-aprendizaje.

Palabras clave:

analisis de datos educativos, Machine Learning, educacion superior.

Introduction

One of the areas that significantly impacts society is education, as it has a great influence on reducing poverty and unemployment, as well as on improving the life conditions of the community ¹. In the education sector, metrics have been identified such as the annual dropout rate, the dropout rate per cohort, the graduation rate, and the inter-monthly absence rate ², which allow measuring students’ academic performance ³. Academic performance is a multidimensional concept that depends on multiple aspects such as the objectives of the teacher, the institution, and the student, etc. It also requires an integration of different techniques and methodologies for its prediction ⁴.

Academic performance involves each of the actors in the teaching-learning process, which has been approached from different fields of knowledge (psychology, education, medicine, statistics, among others), issuing various definitions ⁵^{), (}⁶. This concept is considered to represent a level of knowledge demonstrated in an area or subject while considering age and academic level ⁷. In other words, academic performance is measurable from an assessment of the student; it is the sum of different and complex factors that generate an impact on him/her ⁸. Similarly, for ⁹, there are a series of factors that revolve around effort and indicate the success or failure of the student ¹⁰. Currently, with the incursion of the web and ICTs applied to education, this has undergone a series of changes, among which a large volume of data has emerged given the interaction between students, teachers, and institutions ¹¹^{), (}¹². These data are stored, and little of them is used to improve the academic performance and orientation of the student ¹³. Therefore, it is necessary to investigate a decision-making model that contributes to the improvement of academic performance.

Decision-making models in the education sector have undergone a certain evolution in terms of the type of data analytics used, as suggested by ¹⁴: descriptive analytics (performance of all the activities studied) carried out with spreadsheets; diagnostic analytics (past performance to analyze information) conducted by means of computer science; and predictive analytics (anticipating behaviors based on historical relationships between variables) performed using data mining and machine learning techniques.

Related works

Machine learning is a subdiscipline of artificial intelligence that is based on addressing and solving problems from numerical disciplines such as probabilistic reasoning, research based on statistics, information retrieval, and pattern recognition. In this way, machines, through the execution of algorithms, become capable of performing tasks commonly performed by humans ¹⁵. This field is subdivided into several branches, as shown in Fig. 1. Supervised learning takes place when each of the observations of the data set has a related variable or information that indicates what happened (i.e., when entries are labeled). Machine learning (ML) has begun to permeate the educational field, allowing for the collection, cleaning, analysis, and visualization of data on educational actors, in order to optimize related aspects of the teaching-learning process ¹⁵, which is why it is currently regarded as one of the techniques that will help decision-making in these contexts ¹⁶.

In the last decade, multiple studies have been carried out which seek to establish the variables that specifically affect academic performance. Research has been carried out in areas such as psychology, where, apart from demographic data, the influence of variables related to interest, motivation, attendance, integration, self-regulation, commitment, participation, anxiety, and communication on academic performance have been considered ¹⁷^)-(²¹. From the field of statistics, contributions have been made such as those reflected in ²¹^)-(²³, which apply statistical models that seek to examine the variables involved in university admission (admission and pre-university exams), proposing a model that involves various interrelated variables in an attempt to predict academic performance. Some early research have grouped the variables into economic, demographic, and psychological factors ²⁴^{), (}²⁵. Others have expanded the number of factors, grouping them into demographic, socioeconomic, institutional, sociocultural, socioeconomic, pedagogical, academic, psychological, intellectual, and technological factors, and, due to the rise of ICTs, they have included the learning analytics factor (online interactions) ¹⁰.

Recent works have made it possible to group the variables into fewer factors, such as previous academic performance, demographics, e-learning activity, and psychological and environmental factors ²⁶, considering their influence on the variable under study. Table I shows some previous works that have used supervised algorithms as prediction models of academic performance. The variables associated with these studies were grouped into the factors of the classification proposed in ²⁷. This classification is obtained considering previous research and our reference research ²⁷^)-(²⁹, grouping the variables that are easy to identify, of a controllable nature, that are supported by theory, and that can be grouped into previously defined factors. It can be seen that most variables are grouped mainly within the academic and sociodemographic factors (place of residence, number of family members, level of education of the parents, distance traveled to the educational center), followed by psychosocial factors and academic management.

Table I: Previous work on predicting academic performance using supervised algorithms

Factor	Variables	Previous work with supervised machine learning algorithms
Academic	Government test score, grade point average from the last year of high school, admission test result, academic average or GPA (Grade point average), grades by subject, behavior in seminars, conferences and extracurricular activities	(30-40)
Socio - demographic	Age, gender, language, marital status, nationality, socioeconomic variables such as stratum, family income, place of residence, parental education level, occupation, number of family members, distance traveled per journey to school	(32-36,41-45)
Online learning	Number of times of entry to the platform, number of tasks assigned by the teacher, number of exams taken, participation in the discussion forum, amount of material viewed, hours online, number of attendances or absences.	(2, 46-48)
Academic management	Year of admission to the university, number of credits, scholarships obtained, credits taken, credits approved, credits lost, final grade for each subject, number of subjects taken, number of subjects passed, number of subjects missed, number of subjects repeated and number of times he has missed a subject.	(27,37,39,40,45-47,49)
Psychosocial	Interest, motivation, assistance, integration, teamwork, self-regulation, commitment, participation, stress, anxiety	(30,34,38,46,47,50-53)
Academic environment	Type of class / course, duration of the semester, type of program, duration of classes, faculty, course preparation, material, assignments, available resources.	(46,48,49,53,54)

Contributions and organization

This work explores three concepts that converge in the models: academic performance and its possible ways of evaluating it; the factors that affect it; and supervised machine learning algorithms. In the literature review in ²⁷^-²⁹, which was previously published by the authors, there are related works that propose models with several variables that influence performance, but these are usually applied to studying academic performance in an exam, in a specific course, in a year, or to obtain an academic degree. In this sense, this research addresses the problem of determining it throughout the student’s academic life (ten academic semesters) by using data transformation tools, feature selection methods, and supervised ML algorithms.

The fields or areas of knowledge that have studied the multidimensional variable of academic performance are diverse. This has been approached from the field of psychology ¹⁷^)-(¹⁹^{), (}⁵⁵^)-(⁵⁷, which has applied tools related to questionnaires on students’ perceptions regarding academic performance, followed mainly by statistical tools that have a much more marked focus on demographic data and their influence on the variable of interest ²¹^{), (}²²^{), (}⁵⁸^{), (}⁵⁹. Likewise, research related to data science is important, especially studies that use data mining algorithms and ML applied to the field of education.

Therefore, a significant contribution is to propose a methodology and a model to establish university academic performance. Approximately 324 variables are analyzed in this work (50 variables analyzed for each academic semester). The authors provide the essential steps to be followed in order to correctly apply ML algorithms to the field of education (in this case, for a 10-semester engineering program). The results show that, with a good dataset, it is possible to analyze situations of academic life or indicators of educational quality that lead to an improvement of the educational process at the university and secondary and primary education levels. This is an interesting contribution for teachers and researchers in the field of education and engineering who wish to investigate issues of education and ML, since engineering articles generally do not provide a clear and easy-to-learn methodology.

Using ML algorithms (Decision Trees, KNN, SVC, Naive Bayes, LDA), various models have proposed in order to predict the academic performance of engineering students in each of their 10 academic semesters. The number of records used to analyze the 50 variables on average in each of the 10 semesters ranges between 2.300 and 2.100 for the first four semesters studied, as well as between 2.100 and 1.800 for the other semesters. These proposed models and their relevant variables allow for decision-making regarding both students and teachers. This, despite the fact that all of the variables present in the consulted literature are not used.

The rest of the article is organized as follows: Section 2 describes the research methodology; Section 3 details the tests and their results; Section 4 presents a discussion of the results obtained; and Section 5 outlines the conclusions.

Materials and methods

The methodology employed in this research is presented in the following eight steps: 1) referential information; 2) data source; 3) data cleaning and conditioning; 4) statistics; 5) data transformation; 6) selection of characteristics; 7) prediction algorithms; and 8) performance metrics.

Reference information

Initially, a review was carried out in databases such as Springer Links, Proquest, IEEE Explorer, and Science Direct, using combinations of keywords, i.e., “academic performance + machine learning, supervised learning + academic performance, academic performance + EDM, data mining + academic performance, improving educational + Machine Learning”. The aim was to identify the supervised learning ML algorithms for evaluating academic performance in higher education along with its relevant variables. This referential research was carried out for a period of five years using the method for systematic literature reviews (SRL) proposed by ⁶⁰, whose initial phase has already been published ⁶¹.

Data source

Universidad Distrital Francisco José de Caldas (Bogotá DC, Colombia) provided a database with a total of 1.614.472 data from 4.738 students of the Industrial and Electrical Engineering programs between 2008 and 2018. These data from both teachers and students are summarized in 324 variables and grouped into five factors defined in Table I: pre-university academic, socio-demographic, socio-economic, academic management, and academic environment. Based on this information, a methodology was proposed, as well as supervised algorithms that allow predicting university academic performance.

Data cleaning and conditioning

This process initially consisted of eliminating unwanted observations, correcting structural errors, managing values, and handling missing data, as this would probably be reflected as abnormal data and cause poor prediction in the final models. Likewise, information from students who had inconsistent records was discarded, and new variables were created from the information provided (e.g., distance traveled per journey to school, per-semester average, number of subjects taken). Thus, the information was organized, considering the aforementioned factors and the vast majority of variables that group each factor, which resulted in 4.500 records of undergraduate students.

Data statistics

The supplied datasets (.CSV files) were merged, thus obtaining input data. Descriptive statistics were carried out through Python libraries in order to learn more about the data framework ⁶².

Data transformation

As it is possible that an independent variable exerts a greater influence on the dependent variable (in this case, academic performance) due to the fact that its numerical scale is greater than that of the other variables, it was necessary to carry out different types of transformations in order to obtain a better quasi-Gaussian curve for the variables of the dataset (Rescale, Standardize, Normalize, Yeo-Johnson, Box-cox). These transformations sought to eliminate influence effects, since they are mainly syntactic modifications carried out on data without changing the algorithm ⁶³.

Feature selection

In order to take advantage of the information provided, a good selection must be made of the most inclusive or relevant characteristics of the output variable (64). The literature presents two options: the use of feature selection methods (which include and exclude the most relevant features for the development of the problem without changing them and which are generally divided into filter, wrapping, embedded, and assembly methods); and dimensional reduction methods (which create new combinations of attributes from base ones).

Prediction algorithms

The supervised machine learning algorithms implemented in the dataset were KNN, Decision Trees, SVC, Naive Bayes, and LDA. It is worth mentioning that it was necessary to calculate the dependent variable of study (academic performance) semester by semester in accordance with the norms established by the University and the Colombian government, since its wide range of numerical values generated inconsistencies in the execution. The scale generated to define the variable is shown in Table II, which is based on the ranges established by the Colombian Ministry of National Education.

Table II: Performance variable conventions

Performance	Average	Number
Superior Performance	50 - 45	4
High performance	44 - 40	3
Basic Performance	39 - 30	2
Low performance	29 - 0	1

K-Nearest Neighbors (KNN) is one of the classification algorithms whose performance depends on the selection of the hyper parameter K and the distance measure used between two data points (Euclidean, Manhattan, or Minkowski) ⁶⁵. Decision Trees are a kind of diagram that consists of internal nodes corresponding to a logical test on an attribute and connection branches used to illustrate the whole process and show the result ⁶⁶. The top node in a tree is the root node and represents the entire dataset ⁶⁷. In order to establish which is the best partition of the node, different metaheuristics have been suggested which seek to minimize entropy, i.e., information gain and the Gini index. SVM (Support Vector Machines) allow searching for a hyperplane in a high dimensional space that separates the classes in a dataset. It is implemented using a kernel (linear or nonlinear) ⁶⁸. Naive Bayes is a classifier supported by Bayes’ theorem with good classification precision. It is implemented by estimating a posterior probability ⁶⁹. Finally, LDA makes predictions by estimating the probability that a new set of entries belongs to each class. The class that gets the highest probability is the output class, and a prediction is thus made ⁷⁰.

Performance metrics

There are several ways to evaluate the results of a ML algorithm. According to ⁷¹, the quality of the classification should be evaluated by one of the four different performance metrics: accuracy, precision (specificity), recall (sensitivity), and the F1 score. These values are are determined from the confusion matrix (Table III).

Table III: Confusion matrix

			Predicted Values
			Positive	Negative
Actual Values	Positive	True Positive (TP)	False Negative (FN)
Negative	False Positive (FP)	True Negative (TN)

Accuracy is defined as the number of correctly predicted instances over the total number of records, precision is the ratio of correctly predicted positive instances to the total predicted positive instances, sensitivity is calculated as the ratio of the number of correctly predicted instances to the total number of positives, and the F1 score is the weighted average of precision and sensitivity.

Results

By applying the methodology described above, various results were obtained for steps 4, 5, 6, 7, and 8.

Regarding statistics

The base dataset consists of 324 variables on average which influence students’ academic performance and were grouped by semester. It was necessary to create other variables mentioned in the literature that could influence performance, e.g., the number of subjects taken, missed, and repeated. Universidad Distrital Francisco José de Caldas constantly measures the variables of interest and commitment of the students during their time at the university, applying measurement mechanisms per semester (known as academic tests). Another variable created was distance. This variable is considered, since the time it takes for the student to go from his residence to the university can influence his/her academic performance. The distance between the student’s residence and the university was determined by means of approximations using the Google Maps tool, drawing a radial perimeter, and taking the centroid of each location on the map of Bogotá as a reference.