Detection of Outliers and Imputing of Missing Values for Water Quality UV-VIS Absorbance Time Series

Deteccion de Valores Extremos e Imputación de Valores Faltantes para la Calidad de Agua en Series de Tiempo de Absorbancia UV-VIS

  • Leonardo Plazas-Nossa Docente Planta Facultad Ingeniería Universidad Distrital FJC
  • Miguel Antonio Ávila Angulo Universidad Distrital Francisco José de Caldas
  • Andres Torres Pontificia Univerisidad Javeriana Bogotá
Palabras clave: Outlier Detection, Replace Missing Values, UV-Vis Absorbance, Water Quality, Winsorizing. (en_US)
Palabras clave: Absorbancia UV-Vis, calidad de agua, detección de valores extremos, enventaneo, imputación de valores faltantes (es_ES)

Resumen (en_US)

Context: The UV-Vis absorbance collection using online optical captors for water quality detection may yield outliers and/or missing values. Therefore, data pre-processing is a necessary pre-requisite to monitoring data processing. Thus, the aim of this study is to propose a method that detects and removes outliers as well as fills gaps in time series.

Method: Outliers are detected using Winsorising procedure and the application of the Discrete Fourier Transform (DFT) and the Inverse of Fast Fourier Transform (IFFT) to complete the time series. Together, these tools were used to analyse a case study comprising three sites in Colombia ((i) Bogotá D.C. Salitre-WWTP (Waste Water Treatment Plant), influent; (ii) Bogotá D.C. Gibraltar Pumping Station (GPS); and, (iii) Itagüí, San Fernando-WWTP, influent (Medellín metropolitan area)) analysed via UV-Vis (Ultraviolet and Visible) spectra.

Results: Outlier detection with the proposed method obtained promising results when window parameter values are small and self-similar, despite that the three time series exhibited different sizes and behaviours. The DFT allowed to process different length gaps having missing values. To assess the validity of the proposed method, continuous subsets (a section) of the absorbance time series without outlier or missing values were removed from the original time series obtaining an average 12% error rate in the three testing time series.

Conclusions: The application of the DFT and the IFFT, using the 10% most important harmonics of useful values, can be useful for its later use in different applications, specifically for time series of water quality and quantity in urban sewer systems. One potential application would be the analysis of dry weather interesting to rain events, a feat achieved by detecting values that correspond to unusual behaviour in a time series. Additionally, the result hints at the potential of the method in correcting other hydrologic time series.

Resumen (es_ES)

Contexto: El registro de la absorbancia UV-Vis mediante captores ópticos en línea para la detección de la calidad del agua, en donde se pueden presentar valores atípicos o valores faltantes. Por lo tanto, el pre-procesamiento para corregir dichas anomalías es necesario para un mejor análisis de los datos de monitoreo. El objetivo de este estudio es proponer un método para detectar e imputar valores extremos  como también completar valores faltantes en series de tiempo.

Método: La detección de valores atípicos utiliza el procedimiento de enventaneo y la aplicación de la Transformada Discreta de Fourier (DFT –Discrete Fourier Transform) y la inversa de la Transformada Rápida de Fourier (IFFT–Inverse of Fast Fourier Transform) para completar las series de tiempo. Estas herramientas fueron utilizadas para un caso de estudio compuesto por tres sitios en Colombia (i) PTAR-Salitre (Planta de Tratamiento de Aguas Residuales) Bogotá D.C., afluente; (ii) Estación´ Elevadora de Gibraltar Bogotá D.C.; y (iii) PTAR-San Fernando, área metropolitana de Medellín, afluente) analizados mediante espectros UV-Vis (Ultravioleta y Visible).

Resultados: La detección de valores atípicos con el método propuesto obtiene resultados prometedores cuando los valores de los parámetros de la ventana son pequeños y auto-similares, esto  a pesar de que las tres series de tiempo utilizadas presentan diferentes tamaños y comportamientos. Para validar la metodología propuesta, sub-conjuntos continuos (una sección) de las series de tiempo de absorbancia sin valores ausentes o atípicos, fueron removidos de las series original obteniéndose  tasas de error de 12 % en promedio para todos los tres sitios de estudio.

Conclusiones: La aplicación de la DFT y la IIFT, utilizando el 10 % de los harmónicos más importantes de los valores útiles es crucial para su posterior uso en diferentes aplicaciones, específicamente para series de tiempo de calidad y cantidad de agua en sistema de saneamiento urbano. Una posible aplicación podría ser la comparación de los efectos de clima seco respecto a temporadas de lluvia, mediante la detección de valores que corresponden a comportamiento inusual  en una serie de tiempo. Además, los resultados indican potencial aplicación  futura en la corrección de otras series de tiempo hidrológicas.

Descargas

La descarga de datos todavía no está disponible.

Biografía del autor/a

Leonardo Plazas-Nossa, Docente Planta Facultad Ingeniería Universidad Distrital FJC
Ingeniero Electrónico, Magister en Teleinformática, Docente Facultad de Ingeniería, Universidad Distrital Francisco José de Caldas.
Miguel Antonio Ávila Angulo, Universidad Distrital Francisco José de Caldas
Ingeniero Catastral y Geodesta, Magister en Teleinformática, Docente Facultad de Ingeniería, Universidad Distrital Francisco José de Caldas.
Andres Torres, Pontificia Univerisidad Javeriana Bogotá
Ingeniero Civil, Especialización en Sistemas Gerenciales de Ingeniería, Maestría en Ingeniería Civil, Doctorado en Ingeniería Civil, Grupo de Investigación Ciencia e Ingeniería del Agua y el Ambiente, Facultad de Ingeniería, Pontificia Universidad Javeriana.

Referencias

Langergraber, G., Fleischmann, N., ofstaedter, F. and Weingartner A., “Monitoring of a paper mill waste water treatment plant using UV/VIS spectroscopy”. IWA Water Science and Technology, 49(1), 2004, pp. 9-14.

Youquan,Z., Yuchun,L., Yang,Z. and Yanjun,F.,“A Novel Monitoring System for COD Using Optical Ultraviolet

Absorption Method”. Procedia Environmental Sciences, 10, 2011, pp. 2348-2353.

Storey, M., van der Gaag, B. and Burns, B., “Advances in on-line drinking water quality monitoring and early

warning systems”. Water Research, 45, 2011, pp. 741-747.

Sempere-Paya ́, V. and Santonja-Climent, S., “Integrated sensor and management system for urban waste water

networks and prevention of critical situations”. Computers, Environment and Urban Systems, 36, 2012, pp. 65-80.

Xu,Z., Liu,B., Dong,Q., Lei,Y., Li,Y., Ren,J., and McCutcheon,J., “Flat microliter membrane-based microbial

fuel cell as “on-line sticker sensor” for self-supported in situ monitoring of wastewater shocks”. Bioresource

Technology, 197, 2015, pp. 244-251.

Bowerman, B., O’Conell, R. and Koehler, A., Forecasting, Time Series, and Regression: An Applied Approach.

Fourth Edition. Thomson Learning. USA 2006.

Gujarati, D. and Porter, D., Basic Econometrics. Fifth Edition. McGraw-Hill Higher Education/Irwin New

York-USA. 2008.

Lind, D., Marchal, W. and Wathen, S., Statistical techniques in business & economics. Fifteenth Edition.

McGraw-Hill/Irwin. New York-USA 2012.

Drolc, A. and Vrtovsˇek, J., “Nitrate and nitrite nitrogen determination in waste water using on-line UV

spectrometric method”. Bioresource Technology, 101, 2010, pp. 4228-4233.

Al-Monami,F. and Ormeci,B., “Measurement of polyacrylamide polymers in water and waste water using an

in-line UV–vis spectrophotometer”. Journal of Environmental Chemical Engineering, 2, 2014, pp. 765-772.

Bollmann, U., Vollertsen, J., Carmelier, J. and Bester, K., “Dynamics of biocide emissions from buildings in a suburban stormwater catchment – Concentrations, mass loads and emission processes”. Water Research, 56, 2014,

pp. 66-76.

Altmann, J., Massa, L., Sperlich, A. and Gnirss, R., “UV254 absorbance as real-time monitoring and control

parameter for micropollutant removal in advanced wastewater treatment with powdered activated carbon”. Water Research, 94, 2016, pp. 240-245.

Murla, D., Gutierrez, O, Martinez, N., Sun ̃er, D., Malgrat, P. and Poch, M., “Coordinated management of

combined sewer overflows by means of environmental decision support systems”. Science of the Total Environment,

, 2016, pp. 256-264.

Lacour, C., Joannis, C. and Chebbo, G., “Assessment of annual pollutant loads in combined sewers from

continuous turbidity measurements: Sensitivity to calibration data”. Water Research, 43, 2009, pp. 2179-2190.

Becouze-Lareure, C.Thiebaud, l.Bazin,C., Namour,P., Breil,P. and Perrodin,Y.,“Dynamics of toxicity within

different compartments of a peri-urban river subject to combined sewer overflow discharges”. Science of the Total

Environment, 539, 2016, pp. 503-514.

Gasperi, J. Gromaire, M. Kafi, M. Moilleron, R. and Chebbo, G., “Contributions of wastewater, runoff and

sewer deposit erosion to wet weather pollutant loads in combined sewer systems”. Water Research, 44, 2010, pp.

-5886.

Me ́tadier, M., & Bertrand-Krajewski, J.-L., “Assessing dry weather flow contribution in TSS and COD storm

events loads in combined sewer systems”. Water Science and Technology, 63(12), 2011, pp. 2983-2991.

Bi,E.,Monette, F. and Gasperi,J.,“Analysis of the influence of rain fall variables on urban effluents concentrations and fluxes in wet weather”. Journal of Hydrology, 523, 2015, pp. 320-332.

Saagi,R.,Flores-Alsina, X.,Fu,G., Butler,D. and Gernaey,K.,“Catchment & sewer network simulation model

to benchmark control strategies within urban wastewater systems”. Environmental Modelling & Software, 78, 2016,

pp. 16-30.

Johnson,R. and Wichern,D., Applied Multivariate Statistical Analysis. 6th ed. Pearson PrenticeHall,USA,2007.

D ́ıaz, C., Garc ́ıa, P., Alonso, J., Torres, J. and Taboada, J., “Detection of outliers in water quality monitoring

samples using functional data analysis in San Esteban estuary (Northern Spain)”. Science of the Total Environment, 439, 2012, pp. 54–61.

Cucina, D., di Salvatore, A. and Protopapas, M., “Outliers detection in multivariate time series using genetic algorithms”. Chemometrics and Intelligent Laboratory Systems, 132, 2014, pp. 103-110.

Gharibnezhad, F., Mujica, L. and Rodellar, J., “Applying robust variant of Principal Component Analysis as a damage detector in the presence of outliers”. Mechanical Systems and Signal Processing, 50-51, 2015, pp. 467-479.

Grane ́, A. and Veiga, H. “Wavelet-based detection of outliers in financial time series”. Computational Statistics and Data Analysis, 54, 2010, pp. 2580-2593.

Pin ̃eiro,J.,Mart ́ınez,J.Garc ́ıa,P.,Alonso, J.,D ́ıaz,C. and Taboada,J.,“Analysis and detection of outliers in water quality parameters from different automated monitoring stations in the Min ̃o river basin (NWSpain)”. Ecological Engineering, 60, 2013, pp. 60-66.

Gumedze,F. and Chatora,T., “Detection of outliers in longitudinal count data via over dispersion”.Computational Statistics and Data Analysis, 79, 2014, pp. 192-202.

Sangeux, M. and Polak, J., “A simple method to choose the most representative stride and detect outliers”. Gait & Posture, 41, 2015, pp. 726-730.

Qi, M., Fu, Z. and Chen, F., “Outliers detection method of multiple measuring points of parameters in power plant units”. Applied Thermal Engineering, 85, 2015, pp. 297-303.

Martínez,J.,Saavedra,A ́.García-Nieto,P.,Pin ̃ero, J.Iglesias,C., Taboada,J., Sancho,J. and Pastor,J.,“Air quality parameters outliers detection using functional data analysis in the Langreo urban area (Northern Spain)”. Applied Mathematics and Computation, 241, 2014, pp. 1-10.

Macia ́-pe ́rez, F.,Berna-Martinez,J., Fernandez,A. and Abreu, M.,“Algorithm for the detection of outliers based on the theory of rough sets”. Decision Support Systems, 75, 2015, pp. 63-75.

Song, X., Liu, Z., Yang, J. and Qi, Y., “Extended semi-supervised fuzzy learning method for nonlinear outliers via pattern discovery”. Applied Soft Computing, 29, 2015, pp. 245-255.

Dumedah, G., Jeffrey, P. and Li, W., “Assessing artificial neural networks and statistical methods for infilling missing soil moisture records”. Journal of Hydrology, 515, 2014, pp. 330-344.

Figueroa, J., Kalenatic, D. and Lopez, C., “Missing data imputation in multivariate data by evolutionary algorithms”. Computers in Human Behavior, 27(5), 2011, pp. 1468-1474.

Figueroa-García, J., Kalenatic, D. and Lopez, C., “Incomplete Time Series: Imputation through Genetic Algorithms”. Time Series Analysis, Modeling and Applications, 47, 2013, pp. 31-52.

de Franc ̧a, F., Coelho, G. and von Zuben, F., “Predicting missing values with biclustering: A coherence-based approach”. Pattern Recognition, 46, 2013, pp. 1255-1266.

Folch-Fortuny, A., Arteaga, F. and Ferrer, A., “PCA model building with missing data: New proposals and a comparative study”. Chemometrics and Intelligent Laboratory Systems, 146, 2015, pp. 77-88.

Carvajal, C., Bayona, D. and Ortiz, Z., ”Taxonomy extension and missing-values treatment over an informatics-security incident repository”. Ingenier ́ıa, 18(1), 2013, pp. 24-49.

Dumedah, G. and Coulibaly, P.,“Evaluation of statistical methods for infilling missing values in high-resolution soil moisture data”. Journal of Hydrology, 400, 2011, pp. 95-102.

Haworth, J. and Cheng, T., “Non-parametric regression for space–time forecasting under missing data”. Computers, Environment and Urban Systems, 36, 2012, pp. 538-550.

Junger,W. and Ponce, A., “Imputation of missing data in time series for air pollutants”. Atmospheric Environment, 102, 2015, pp. 96-104.

Avellaneda, J., Ochoa, C., and Figueroa-García, J., Comparison between a self organizing neural fuzzy system and an ARIMAX model to forecasting volatile economic series”. Ingeniería, 17(2), 2012, pp. 26-34.

Tukey,J.,Exploratorydataanalysis.Addison-Wesely.

Acuña,E. and Rodríguez,C., On Detection of Outliers and Their Effect in Supervised Classification. Department

of Mathematics University of Puerto Rico at Mayaguez, Mayaguez, Puerto Rico. 2013. [Online]. Available

http://academic.uprm.edu/ eacuna/vene31.pdf

s::can,Manualana::proVersion5.3 September2006 Release. Messtechnik GmbH, Vienna, Austria,2006.

Liu, H., Shah, S. and Jiang, W., “On-line outlier detection and data cleaning”. Computers and Chemical

Engineering, 28, 2004, pp. 1635-1647.

Ko,S-JandLee,Y.,“Theoretical analysis of winsorizing smoothers and their applications to image processing”.

Acoustics, Speech, and Signal Processing, ICASSP-1991, pp. 3001-3004.

Pearson, R., “Outliers in process modelling and identification”. IEEE Transactions on Control Systems

Technology, 10, 2002, pp. 55-63.

Kontaki, M., Gounaris, A., Papadopoulos, A., Tsichlas, K. and Manolopoulos, Y., “Efficient and flexible algorithms for monitoring distance based outliers over data streams”. Information Systems, 55, 2015, pp. 37–53.

Proakis, J. and Manolakis, D., Digital signal processing principles, algorithms, and applications. Fourth Ed.New Jersey: Pearson Prentice Hall, 2007.

Plazas-Nossa, L. and Torres, A., “Fourier analysis as a forecasting tool for absorbance time series received by UV-Vis probes installed on urban sewer systems”. Proceedings of 8th International Conference Novatech, 2013, Lyon, France, 23-27 June 2013.

R Core Team, “R: A language and environment for statistical computing”. R Foundation for Statistical Computing, Vienna, Austria, 2014. [Online]. Available URL http://www.R-project.org/

Cómo citar
Plazas-Nossa, L., Ávila Angulo, M. A., & Torres, A. (2017). Deteccion de Valores Extremos e Imputación de Valores Faltantes para la Calidad de Agua en Series de Tiempo de Absorbancia UV-VIS. Ingeniería, 22(1), 09-22. https://doi.org/10.14483/udistrital.jour.reving.2017.1.a01
Publicado: 2017-01-30
Sección
Inteligencia Computacional