DOI:
https://doi.org/10.14483/23448350.24617Publicado:
06/19/2026Número:
Vol. 53 Núm. 1 (2026): Vol. 53 No. 1(2026): Enero-Abril 2026Sección:
ArtículosAnalysis of Code Smells Detection Using Machine Learning and Deep Learning Approaches
Palabras clave:
code smells, machine learning, deep learning, software metrics, software engineering (en).Palabras clave:
olores de código, aprendizaje automático, aprendizaje profundo, métricas de software, ingeniería de software (es).Descargas
Resumen (en)
Detecting Code Smells (CS) is important for preventing future problems in software development. It also helps improve software quality and save time on maintenance. This study contributes with a systematic experiment that integrates Data Leakage control, rigorous preprocessing, and the comparison of Machine Learning (ML) and Deep Learning (DL) models, contributing with a replicable methodology for CS detection. To this end, an experiment was designed that focused on CS analysis using artificial intelligence approaches. ML and DL models were applied to the dataset based on method-level software metrics. The methodological process included comprehensive processing, which addressed variable cleaning and normalization, transformations, and feature reduction. In addition, the problem of data leakage was controlled to ensure the validity of the results. Multiple ML models (Random Forest, Support Vector Machine, Decision Tree, K-Nearest Neighbors, Naive Bayes, and Logistic Regression) and a DL model based on a MLP were trained and evaluated. The results showed remarkable performance in most models, achieving accuracy between 94\% and 98\% after cross-validation with 10 folds. However, the MLP stood out with an accuracy close to 99\%, positioning it as the best-performing classifier for CS detection.
Resumen (es)
Referencias
Arcelli Fontana, F., Mäntylä, M. V., Zanoni, M., & Marino, A. (2016). Comparing and experimenting machine learning techniques for code smell detection. Empirical Software Engineering, 21(3), 1143-1191. https://doi.org/10.1007/s10664-015-9378-4
Betancourt, G. A. (2005). Las máquinas de soporte vectorial (SVMs). Scientia et technica, 1(27), 67-72.
https://www.redalyc.org/pdf/849/84911698014.pdf
Bouke, M. A., Zaid, S. A., & Abdullah, A. (2024). Implications of data leakage in machine learning preprocessing: A multi-domain investigation [Preprint].
https:/doi.org/10.21203/rs.3.rs-4579465/v1
Caram, F. L., Rodrigues, B. R. D. O., Campanelli, A. S., & Parreiras, F. S. (2019). Machine learning techniques for code smells detection: A systematic mapping study. International Journal of Software Engineering and Knowledge Engineering, 29(02), 285-316. https://doi.org/10.1142/S021819401950013X
Cruz, D., Santana, A., & Figueiredo, E. (2020, June). Detecting bad smells with machine learning algorithms: an empirical study. In ACM (Eds.), TechDebt '20: Proceedings of the 3rd International Conference on Technical Debt (pp. 31-40). ACM. https://doi.org/10.1145/3387906.3388618
Di Nucci, D., Palomba, F., Tamburri, D. A., Serebrenik, A., & De Lucia, A. (2018, March). Detecting code smells using machine learning techniques: Are we there yet? In IEEE (Eds.), 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (saner) (pp. 612-621). IEEE. https://doi.org/10.1109/SANER.2018.8330266
Fontana, F. A., Braione, P., & Zanoni, M. (2012). Automatic detection of bad smells in code: An experimental assessment. The Journal of Object Technology, 11(2), 5-1. https://doi.org/10.5381/jot.2012.11.2.a5
Forero-Corba, W., & Bennasar, F. N. (2024). Técnicas y aplicaciones del Machine Learning e Inteligencia Artificial en educación: una revisión sistemática. RIED-Revista Iberoamericana de Educación a Distancia, 27(1), 209-253. https://doi.org/10.5944/ried.27.1.37491
Falahi, T., Nassreddine, G., & Younis, J. (2023). Detecting data outliers with machine learning. Al-Salam Journal for Engineering and Technology, 2(2), 152-164. https://doi.org/10.55145/ajest.2023.02.02.018
Fowler, M. (2018). Refactoring: Improving the design of existing code. Addison-Wesley Professional.
García, R. (2021). El perceptrón: una red neuronal artificial para clasificar datos. Revista de Investigación en Modelos Matematicos Aplicados a la Gestión de la Economía, 8(1), 1-14.
https://www.economicas.uba.ar/investigacion/wp-content/uploads/Garcia-Roberto-1.pdf
Guggulothu, T., & Moiz, S. A. (2020). Code smell detection using multi-label classification approach. Software Quality Journal, 28(3), 1063-1086. https://doi.org/10.1007/s11219-020-09498-y
Hall, M. A. (2000). Correlation-based feature selection of discrete and numeric class machine learning.
Hernández Vargas, L. A. (2015). Selección de la metodología para determinar atipicos en las bases de cálculo de un índice de costos [Trabajo de Investigación Aplicada para Especialización en Estadística Aplicada, Fundacion Universitaria los Libertadores]. https://repository.libertadores.edu.co/server/api/core/bitstreams/06bd2e2c-db97-4769-8d05-fa5ac769d729/content.
Kiyak, E. O., Birant, D., & Birant, K. U. (2019, October). Comparison of multi-label classification algorithms for code smell detection [Conference article]. 2019 3rd international symposium on multidisciplinary studies and innovative technologies (ISMSIT). https://doi.org/10.1109/ISMSIT.2019.8932855
Kotsiantis, S. B., Zaharakis, I., & Pintelas, P. (2007). Supervised machine learning: A review of classification techniques. Emerging Artificial Intelligence Applications in Computer Engineering, 160(1), 3-24.
Luiz, F. C., de Oliveira Rodrigues, B. R., & Parreiras, F. S. (2019, May). Machine learning techniques for code smells detection: An empirical experiment on a highly imbalanced setup [Conference article]. XV Brazilian Symposium on Information Systems. https//doi.org/10.1145/3330204.3330275
Nguyen Thanh, B., Nguyen NH, M., Le Thi My, H., & Nguyen Thanh, B. (2022, December). ml-Codesmell: A code smell prediction dataset for machine learning approaches. In ACM (Eds.), Proceedings of the 11th International Symposium on Information and Communication Technology (pp. 368-374). ACM. https://doi.org/10.1145/3568562.3568643
Noroozi, Z., Orooji, A., & Erfannia, L. (2023). Analyzing the impact of feature selection methods on machine learning algorithms for heart disease prediction. Scientific reports, 13(1), 22588. https://doi.org/10.1038/s41598-023-49962-w
Paiva, T., Damasceno, A., Figueiredo, E., & Sant’Anna, C. (2017). On the evaluation of code smells and detection tools. Journal of Software Engineering Research and Development, 5(1), 7. https://doi.org/10.1186/s40411-017-0041-1
Palma-Mendoza, R. J., de-Marcos, L., Rodriguez, D., & Alonso-Betanzos, A. (2019). Distributed correlation-based feature selection in spark. Information Sciences, 496, 287-299. https://doi.org/10.1016/j.ins.2018.10.052
Tempero, E. (2011). Qualitas Corpus [Dataset]. http://qualitascorpus.com/
Ramírez-Gallego, S., Krawczyk, B., Garca, S., Woniak, M., & Herrera, F. (2017). A survey on data preprocessing for data stream mining. Neurocomputing, 239(C), 39-57. https://doi.org/10.1016/j.neucom.2017.01.078
Raymaekers, J., & Rousseeuw, P. J. (2024). Transforming variables to central normality. Machine Learning, 113(8), 4953-4975. https://doi.org/10.1007/s10994-021-05960-5
dos Reis, J. P., Abreu, F. B. E., & Carneiro, G. D. F. (2022). Crowdsmelling: A preliminary study on using collective knowledge in code smells detection. Empirical Software Engineering, 27(3), 69. https://doi.org/10.1007/s10664-021-10110-5
dos Reis, J. P., Brito e Abreu, F., & Carneiro, G F. (2022). Code smells dataset (oracles) [Dataset]. https://doi.org/10.5281/zenodo.6555241
Vinay S. (2021). Standardization in machine learning. https://www.researchgate.net/publication/349869617_STANDARDIZATION_IN_MACHINE_LEARNING#fullTextFileContent
Yeo, I. K., & Johnson, R. A. (2000). A new family of power transformations to improve normality or symmetry. Biometrika, 87(4), 954-959.
Cómo citar
APA
ACM
ACS
ABNT
Chicago
Harvard
IEEE
MLA
Turabian
Vancouver
Descargar cita
Licencia
Derechos de autor 2026 Enrique Alejandro Chim Mex, Antonio Armando Aguileta Güemez, Raúl Antonio Aguilar Vera

Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial-CompartirIgual 4.0.
El (los) autor(es) al enviar su artículo a la Revista Científica certifica que su manuscrito no ha sido, ni será presentado ni publicado en ninguna otra revista científica.
Dentro de las políticas editoriales establecidas para la Revista Científica en ninguna etapa del proceso editorial se establecen costos, el envío de artículos, la edición, publicación y posterior descarga de los contenidos es de manera gratuita dado que la revista es una publicación académica sin ánimo de lucro.