Confidence level evaluation of LOD resources on CKAN instances

Evaluación del nivel de confianza de los recursos LOD en instancias CKAN

  • Jhon Francined Herrera-Cubides
  • Paulo Alonso Gaona-García
  • Carlos Enrique Montenegro-Marín
  • Álvaro Varón-Capera
Palabras clave: CKAN, Linked Open Data, Machine Learning, Open Data, TensorFlow, Visual Analytics (en_US)
Palabras clave: CKAN, Linked Open Data, Aprendizaje de Máquina, Datos Abiertos, TensorFlow, Analítica Visual (es_ES)

Resumen (en_US)

Linked Open Data has been an initiative aimed at offering principles for the interconnection of data through machine-readable structures and knowledge representation schemes. At present, there are platforms that allow consuming LOD resources, being CKAN one of the most relevant on a large community made up of governmental organizations, NGOs, among others. However, the resources consumption lacks minimum criteria to determine their validity such as level of trust, quality, linkage and usability of the data; aspects that require a previous systematic analysis on the set of published data. To support this process of analysis and determination of the mentioned criteria, this paper has as purpose to present a method that allows analyzing the dataset current state obtained from the different instances published in CKAN, with the aim of evaluating the levels of trust that can offer from their sources. Finally, it presents results, conclusions and future work from the use of the tool for the dataset consumption belonging to certain instances ascribed to the CKAN platform.

Resumen (es_ES)

Linked Open Data ha sido una iniciativa orientada a ofrecer una serie de principios para la interconexión de datos mediante estructuras legibles por máquinas y esquemas de representación de conocimiento. En la actualidad existen plataformas que permiten consumir este tipo de recursos LOD, siendo CKAN una de las más relevantes sobre una gran comunidad conformada por organizaciones gubernamentales, ONGs, entre otras. Sin embargo, el consumo de estos recursos carece de criterios mínimos para determinar la validez de los mismos tales como: nivel de confianza, calidad, vinculación y usabilidad de los datos; aspectos que requieren de un análisis sistemático previo sobre el conjunto de datos publicados. Para apoyar este proceso de análisis y determinación de los criterios mencionados, el presente artículo tiene como propósito presentar un método que permita analizar el estado actual de los dataset obtenidos desde las distintas instancias publicadas en CKAN, con el propósito de evaluar los niveles de confianza que pueden ofrecer desde sus fuentes de origen. Finalmente, presenta resultados, conclusiones y trabajo futuro a partir del uso de la herramienta para el consumo de conjuntos de datos pertenecientes a ciertas instancias adscritas a la plataforma CKAN.

Descargas

La descarga de datos todavía no está disponible.

Referencias

[1] BCN, “Linked Open Data: ¿Qué es?”, 2014. [Online]. Available at: https://datos.bcn.cl/es/informacion/que-es

[2] Open Knowledge International, “Open Data HandBook”, 2018. [Online]. Available at: http://opendatahandbook.org/

[3] C. Caicedo, “Virtualización Organizacional, Web Semántica y Redes Sociales”, Visión Electrónica, vol. 6, no. 2, 2012, pp. 134-159. https://doi.org/10.14483/22484728.3894

[4] T. Berners, J. Hendler and O. Lassila, “The Semantic Web”, 2001. Scientific American, vol. 284, no. 5, 2001, pp. 29-37. [Online]. Available at: https://www.scientificamerican.com/article/the-semantic-web/

[5] T. Berners-Lee, C. Bizer and T. Heath, “Linked Data - The Story so Far”, International Journal on Semantic Web and Information Systems, vol. 5, no. 3, 2009, pp. 1-22. https://doi.org/10.4018/jswis.2009081901

[6] C. Bizer and T. Heath, “Linked Data. Evolving the Web into a Global Data Space”, Morgan & Claypool Publishers, 2011. https://doi.org/10.2200/S00334ED1V01Y201102WBE001

[7] M. Schmachtenberg, C. Bizer and H. Paulheim, “State of LOD Cloud”, 2014. [Online]. Available at: http://stats.lod2.eu/

[8] LOD2, “Version 1.0 of the LOD2”, 2014. [Online]. Available at: http://lod2.stat.gov.rs/lod2statworkbench

[9] M. López-Bonilla, “Semántica para repositorios de Objetos de Aprendizaje”, Scientia Et Technica, vol. 19, no. 4, 2014, pp. 425-432.

[10] R. Melero, “Repositorios”, 2014. [Online]. Available at: https://ucrindex.ucr.ac.cr/docs/repositorios_2014.pdf

[11] CKAN, “API Guide”, 2018. [Online]. Available at: http://docs.ckan.org/en/latest/api/

[12] J. Winn, “Open Data and the Academy: An Evaluation of CKAN for Research Data Management”, 2013. [Online]. Available at: http://eprints.lincoln.ac.uk/9778/1/CKANEvaluation.pdf

[13] J. Herrera-Cubides, P. Gaona-García and S. Sánchez-Alonso, “Linked Data: Qué sucede con la Heterogeneidad y la Interoperabilidad”, Scientia et Technica, vol. 23, no. 2, 2018, p.p. 230-240.

[14] B. Farias, C. Burle and N. Calegari, “Data on the Web - Best Practices”, 2017. [Online] Available at: https://w3c.github.io/dwbp/bp.html

[15] E. Ruckhaus, M. Vidal, S. Castillo, O. Burguillos and O. Baldizan, “Analyzing Linked Data Quality with LiQuate”, The Semantic Web: ESWC 2014 Satellite Events. ESWC, 2014. https://doi.org/10.1007/978-3-319-11955-7_72

[16] J. Herrera-Cubides, P. Gaona-García and K. Gordillo-Orjuela, “A View of the Web of Data. Case Study: Use of Services CKAN”, Ingeniería, vol 22, no. 1, 2017, pp. 111-124. https://doi.org/10.14483/udistrital.jour.reving.2017.1.a07

[17] E. Rajabi, S. Sanchez-Alonso and M.-A. Sicilia, “Analyzing broken links on the web of data: An experiment with DBpedia”, Journal of the Association for Information Science and Technology, vol. 65, no. 8, 2014, pp. 1721–1727.
https://doi.org/10.1002/asi.23109

[18] C. Bizer, “The Emerging Web of Linked Data”, IEEE Intelligent Systems, vol. 24, no. 5, 2009, pp. 87-92. https://doi.org/10.1109/MIS.2009.102

[19] P. Gaona-García, J. Herrera-Cubides, J. Alonso-Echeverri, K. Riaño-Vargas and A. Gómez-Acosta, “A Fuzzy Logic System to Evaluate Levels of Trust on Linked Open Data Resources”, Revista Facultad de Ingeniería, no. 86, 2018, pp. 40-53.

[20] E. Arias-Caracas, D. Mendoza-López, P. Gaona-García, J. Herrera-Cubides and C. Montenegro-Marín, “Evaluation of the Linked Open Data Quality Based on a Fuzzy Logic Model”, Artificial Intelligence Applications and Innovations, 2018. https://doi.org/10.1007/978-3-319-92007-8_47

[21] Lod Cloud, “The Linked Open Data Cloud”, 2011. [Online]. Available at: https://lod-cloud.net/

[22] Linked Science, “Tutorial on Visual Analytics with Linked Open Data”, 2014. [Online]. Available at: http://linkedscience.org/events/vislod2014/

[23] T. Berners, “Linked Data”, 2006. [Online]. Available at: https://www.w3.org/DesignIssues/LinkedData.html

[24] R. Ávila-Alonso, “Aplicación de los Principios Linked Open Data a la lista de encabezamientos de materia de la Biblioteca de la Universidad Politécnica de Madrid”, thesis MSc., Universidad Carlos III de Madrid, Spain, 2014.

[25] Loud Cloud, “Data Hub LOD Datasets”, 2012. [Online]. Available at: http://validator.lod-cloud.net/index.php

[26] C. Baron-Neto, K. Müller, M. Brümmer, D. Kontokostas and S. Hellmann, “Lodvader: an interface to LOD visualization, analytics and discovery in real-time”, 25th WWW Conference, 2016. https://doi.org/10.1145/2872518.2890545

[27] J. Ruhland and L. Wenige, “Scalable Property Aggregation for Linked Data Recommender Systems”, 3rd International Conference on Future Internet of Things and Cloud, 2015. https://doi.org/10.1109/FiCloud.2015.30

[28] E. Loza-Mencía, S. Holthausen, A. Schulz and F. Janssen, “Using data mining on Linked Open Data for analyzing e-procurement information”, DMoLD'13 Proceedings of the 2013 International Conference on Data Mining on Linked Data, vol. 1082, 2013, pp. 50-57. [Online]. Available at: http://ceur-ws.org/Vol-1082/paper4.pdf

[29] N. Fanizzi, C. d’Amato and F. Esposito, “Mining linked open data through semi-supervised learning methods based on self-training”, IEEE Sixth International Conference on Semantic Computing, 2012. pp. 277–284. https://doi.org/10.1109/ICSC.2012.54

[30] Ckan, “What is ckan? User guide”, 2018. [Online]. Available at: http://docs.ckan.org/en/latest/user-guide.html#what-is-ckan

[31] Priyadharshini, “Machine Learning: What it is and Why It Matters”, 2018. [Online]. Available at: https://www.simplilearn.com/what-is-machine-learning-and-why-it-matters-article

[32] A. Varón, J. Herrera-Cubides and P. Gaona-García, “VACIT - Visual Analytics for CKAN Instances Tool (Herramienta de Software)”, Universidad Distrital Francisco José de Caldas, Bogotá, Colombia, 2018.

[33] J. Thomas and K. Cook, “Illuminating the Path: Research and Development Agenda for Visual Analytics”, 2005. [Online]. Available at: http://vis.pnnl.gov/pdf/RD_Agenda_VisualAnalytics.pdf

[34] IEBS, “¿Que es el Visual Analytics?”, 2005. [Online]. Available at: https://comunidad.iebschool.com/visualanalyticsbusinessintelligencebigdata/que-es-el-visual-analytics/

[35] J. E. Rodríguez and J. Ortiz-Pimiento, “Métodos bayesianos para la clasificación de páginas Web inapropiadas”, Visión Electrónica, vol. 11, no. 2, 2017, pp. 179-189. https://doi.org/10.14483/22484728.13135

[36] Minnesota Geospatial, “About the Minnesota Geospatial Commons”, 2018. [Online]. Available at: https://gisdata.mn.gov/content/?q=about

[37] W3, “SPARQL Endpoint”, 2011. [Online]. Available at: https://www.w3.org/wiki/SparqlEndpoints

[38] ECMA. “The JSON Data Interchange Syntax”, 2017. [Online]. Available at: http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf

[39] R. Thomas-Fielding, “Representational State Transfer (REST)”, 2000. [Online] Available at: https://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm

[40] The Pallets Projects, “Flask”, 2018. [Online] Available at: https://www.palletsprojects.com/p/flask/

[41] Fetch, “Living Standard 2.1 Methods”, 2018. [Online] Available at: https://fetch.spec.whatwg.org/

[42] P. Bloem and G. de Vries, “Machine Learning on Linked Data, a Position Paper”, Proceedings of the First International Conference on Linked Data for Knowledge Discovery, vol. 1232, 2014, pp. 64-68. https://dl.acm.org/citation.cfm?id=3053834

[43] A. Rettinger, U. Losch, V. Tresp, C. d’Amato and N. Fanizzi, “Mining the Semantic Web statistical learning for next generation knowledge bases”, Data Mining and Knowledge Discovery, vol. 24, no. 3, 2014, pp 613-662. https://doi.org/10.1007/s10618-012-0253-2

[44] Tensorflow, “API Documentation”, 2018. [Online] Available at: https://www.tensorflow.org/api_docs/

[45] A. Kariv and R. Pollock, “About Datahub”, 2018. [Online] Available at: https://datahub.io/docs/about

[46] Angular, “What is angular?”, 2018. [Online]. Available at: https://angular.io/docs
Cómo citar
Herrera-Cubides , J. F., Gaona-García , P. A., Montenegro-Marín, C. E., & Varón-Capera , Álvaro. (2019). Evaluación del nivel de confianza de los recursos LOD en instancias CKAN. Visión electrónica, 13(2). https://doi.org/10.14483/22484728.15158
Publicado: 2019-07-26
Sección
Visión Investigadora

Artículos más leídos del mismo autor/a