DOI:
https://doi.org/10.14483/2322939X.15550Publicado:
2019-06-12Número:
Vol. 16 Núm. 1 (2019)Sección:
Investigación y DesarrolloNormalización en desempeño de k-means sobre datos climáticos
Normalization in k-means performance on climate data
Palabras clave:
clustering, K-Means, machine learning, normalization, short-time Fourier transform (en).Palabras clave:
clustering, K-Means, machine learning, normalización, transformada de Fourier a corto plazo (es).Descargas
Resumen (es)
El análisis de clúster sobre datos climatológicos es usado en diversas investigaciones dado que permite obtener resultados interesantes para cada enfoque propuesto. Por tanto, en este trabajo se presenta la evaluación de desempeño del algoritmo de agrupamiento K-Means a partir del uso de normalización aplicada a un conjunto de datos con cuatro variables climatológicas (temperatura, precipitación, humedad relativa y radiación solar) para una estación ubicada en la ciudad de Manizales, Colombia. Esto con el fin de determinar el efecto de aplicar, o no, la normalización en la calidad de los clústeres y evaluar el costo computacional del algoritmo según las características establecidas. Para ello se definen seis escenarios de ejecución para 2, 3 y 5 clústeres con diferente cantidad y agrupación de variables utilizando distancia euclidiana como medida de alejamiento, Davies-Bouldin como método evaluación de calidad de los clústeres y la aplicación de normalización con Z-transformation y Range transformation. Se concluye que, a través de una comparación con k-medoides y aplicación STFT (Transformada de Fourier de Tiempo Reducido), la normalización mejora los resultados y con Z-transformation se obtienen los mejores desempeños de agrupamiento según el índice de Davis-Bouldin.
Resumen (en)
Cluster analysis of climatological data is used in various investigations as it allows interesting results to be obtained for each proposed approach. Therefore, this paper presents the performance evaluation of the K-Means clustering algorithm from the use of standardization applied to a data set with four climatological variables (temperature, precipitation, relative humidity and solar radiation) for a station located in the city of Manizales, Colombia. This in order to determine the effect of applying, or not, the normalization in the quality of the clusters and to evaluate the computational cost of the algorithm according to the established characteristics. For this purpose, six execution scenarios are defined for 2, 3 and 5 clusters with differentquantity and grouping of variables using Euclidean distance as a distance measure, Davies-Bouldin as a quality evaluation method of the clusters and the application of normalization with Z-transformation and Range transformation. It is concluded that, through a comparison with k-medoides and STFT application (Fourier Transform of Reduced Time), the normalization improves the results and with Z-transformation the best grouping performances are obtained according to the Davis-Bouldin index.
Referencias
[2] M. A. Asadi Zarch, B. Sivakumar y A. Sharma, “Assessment of global aridity change,” J. Hydrol., vol. 520, pp. 300–313, 2015. https://doi.org/10.1016/j.jhydrol.2014.11.033
[3] M. Bador, P. Naveau, E. Gilleland, M. Castellà y T. Arivelo, “Spatial clustering of summer temperature maxima from the CNRM-CM5 climate model ensembles & E-OBS over Europe,” Weather Clim. Extrem., vol. 9, pp. 17–24, 2015. https://doi.org/10.1016/j.wace.2015.05.003
[4] L . Carro-Calvo, C. Ordóñez, R. García-Herrera y J. L. Schnell, “Spatial clustering and meteorological drivers of summer ozone in Europe,” Atmos. Environ., vol. 167, pp. 496–510, 2017. https://doi.org/10.1016/j.atmosenv.2017.08.050
[5] M. J. Carvalho, P. Melo-Gonçalves, J. C. Teixeira y A. Rocha, “Regionalization of Europe based on a K-Means Cluster Analysis of the climate change of temperatures and precipitation,” Phys. Chem. Earth, vol. 94, pp. 22–28, 2016. https://doi.org/10.1016/j.pce.2016.05.001
[6] M. I. Chidean, A. J. Caamaño, J. Ramiro-Bargueño, C. Casanova-Mateo y S. Salcedo-Sanz, “Spatio-temporal analysis of wind resource in the Iberian Peninsula with data-coupled
clustering,” Renew. Sustain. Energy Rev., vol. 81, June, pp. 2684–2694, 2018. https://doi.org/10.1016/j.rser.2017.06.075
[7] M. I. Chidean, J. Muñoz-Bulnes, J. Ramiro-Bargueño, A. J. Caamaño y S. Salcedo-Sanz, “Spatio- temporal trend analysis of air temperature in Europe and Western Asia using data-coupled clustering,” Glob. Planet. Change, vol. 129, pp. 45–55, 2015. https://doi.org/10.1016/j.gloplacha.2015.03.006
[8] R. Falquina y C. Gallardo, “Development and application of a technique for projecting novel and disappearing climates using cluster analysis,” Atmos. Res., vol. 197, July, pp. 224–231, 2017. https://doi.org/10.1016/j.atmosres.2017.06.031
[9] M. Ghayekhloo, M. Ghofrani, M. B. Menhaj y R. Azimi, “A novel clustering approach for short-term solar radiation forecasting,” Sol. Energy, vol. 122, pp. 1371–1383, 2015. https://doi.org/10.1016/j.solener.2015.10.053
[10] S. Li, H. Ma, y W. Li, “Typical solar radiation year construction using k-Means clustering and discrete-time Markov chain,” Appl. Energy, vol. 205, May, pp. 720–731, 2017. https://doi.org/10.1016/j.apenergy.2017.08.067
[11] X. Wang et al., “A stepwise cluster analysis approach for downscaled climate projection - A Canadian case study,” Environ. Model. Softw., vol. 49, pp. 141–151, 2013.
[12] Y. Zheng et al., “Assessment of global aridity change,” Ecol. Indic., vol. 75, no. September 2016, pp. 151–165, 2016.
[13] Y. Zheng et al., “Vegetation response to climate conditions based on NDVI simulations using stepwise cluster analysis for the Three-River Headwaters region of China,” Ecol. Indic.,.
September 2016, pp. 0–1, 2017. https://doi.org/10.1016/j.ecolind.2017.06.040
[14] J. Parente, M. G. Pereira y M. Tonini, “Space-time clustering analysis of wildfires: The influence of dataset characteristics, fire prevention policy decisions, weather and climate,” Sci.
Total Environ., vol. 559, pp. 151–165, 2016. https://doi.org/10.1016/j.scitotenv.2016.03.129
[15] F. Mokdad y B. Haddad, “Improved infrared precipitation estimation approaches based on k-means clustering: Application to north Algeria using MSG-SEVIRI satellite data,” Adv. Sp.
Res., vol. 59, no. 12, pp. 2880–2900, 2017. https://doi.org/10.1016/j.asr.2017.03.027
[16] C. C. Aggarwal y C. K. Reddy, "DATA Custering Algorithms and Applications". CRC Press, 2013.
[17] T. T. Nguyen, A. Kawamura, T. N. Tong, N. Nakagawa, H. Amaguchi y R. Gilbuena, “Clustering spatio-seasonal hydrogeochemical data using self-organizing maps for groundwater quality assessment in the Red River Delta, Vietnam,” J. Hydrol., vol. 522, pp. 661–673, 2015. https://doi.org/10.1016/j.jhydrol.2015.01.023
[18] Y. Chen et al., “Air quality data clustering using EPLS method,” Inf. Fusion, vol. 36, pp. 225–232, 2017.
[19] A. Ruzmaikin y A. Guillaume, “Clustering of atmospheric data by the deterministic annealing,” J. Atmos. Solar-Terrestrial Phys., vol. 120, pp. 121–131, 2014. https://doi.org/10.1016/j.jastp.2014.09.009
[20] C. Li, L. Sun, J. Jia, Y. Cai y X. Wang, “Risk assessment of water pollution sources based on an integrated k-means clustering and set pair analysis method in the region of Shiyan,
China,” Sci. Total Environ., vol. 557–558, pp. 307–316, 2016. https://doi.org/10.1016/j.scitotenv.2016.03.069
[21] T. R. Sivaramakrishnan y S. Meganathan, “Point rainfall prediction using data mining technique,” Res. J. Appl. Sci. Eng. Technol., vol. 4, no. 13, pp. 1899–1902, 2012.
[22] C. Marzban y S. Sandgathe, “Cluster Analysis for Verification of Precipitation Fields,” Weather Forecast., vol. 21, no. 5, pp. 824–838, 2006. https://doi.org/10.1175/waf948.1
[23] H. Yahyaoui y H. S. Own, “Unsupervised clustering of service performance behaviors,” Inf. Sci. (Ny)., vol. 422, pp. 558–571, 2018. https://doi.org/10.1016/j.ins.2017.08.065
[24] G. Gan, C. Ma y J. Wu, "Data Clustering: Theory, Algorithms, and Applications". SIAM - Society for Industrial and Applied Mathematics. Philadelphia, Pennsylvania, 2007.