Generating and ranking candidate data models from background knowledge
View/ Open
Date
2021-01-04Author
Oliveira, Daniela
Metadata
Show full item recordUsage
This item's downloads: 877 (view details)
Abstract
Knowledge graphs have emerged as a core technology to publish, share, and integrate data on the Web.
Contrary to traditional data storage solutions, such as relational databases, one of the core functionalities of Knowledge graphs is that data is not required to adhere to strict pre-defined data models.
Nonetheless, ontologies are commonly used as a key technology to integrate information in knowledge graphs at the semantic level. Creating an ontology to model a domain is not a trivial task and requires significant investment of time and effort.
Therefore, data publishers are encouraged to reuse existing ontologies by extending or modifying concepts already described in the domain.
However, finding the right ontologies to model a dataset is a challenge since several valid, relevant data models are likely exist without clear agreement between them.
In this thesis, we developed a framework to ease the task of selecting the best data model for a dataset.
The framework produces a ranked list of candidate data models that fit the data and are interoperable with a knowledge graph of published RDF data sources.
This knowledge graph is obtained by aggregating freely available RDF datasets, extracting their underlying ontology graph, and then enriching its edges to produce a tightly connected graph.
We exploit the content and graph structure of this knowledge graph to compute a score that considers the accuracy, interoperability, and consistency of the candidates.
The output of the framework is the correspondence between a list of input triple patterns (i.e., domain-property-range) and a ranked list of candidate triple patterns from the knowledge graph per input triple.
These rankings are obtained by combining the three scores into a single triple score.
This score combination is weighted and the user has the choice to decide the best weight for each score to best fit their use case or application.
Our experiments show that the framework produces a meaningful set of candidates for different use cases.
In these experiments, we test the knowledge graph creation methodology and we present two domain use cases that demonstrate the usefulness of our approach.
The experiments show that the framework is able to produce a set of reasonable candidate data models to be presented to the user, and the final choice of data model is controlled by a set of parameters that can be adjusted by a user to fit their use case and preferences.
Therefore, our framework assists users in finding a data model that will make the task of annotating data less strenuous to support and sustain the (re)usability of ontologies when creating knowledge graphs on the Web.