Learning content patterns from linked data

Muñoz, Emir

View/Open

LD4IE_2014_ldpatterns.pdf (365.4Kb)

Date

2014

Author

Muñoz, Emir

Metadata

Show full item record

Usage

This item's downloads: 1364 (view details)

Recommended Citation

Muñoz, Emir. (2014). Learning content patterns from linked data. Paper presented at the Proceedings of the Second International Conference on Linked Data for Information Extraction - Volume 1267, Riva del Garda, Italy.

Published Version

http://dl.acm.org/citation.cfm?id=2878575.2878579

Abstract

Linked Data (LD) datasets (e.g., DBpedia, Freebase) are used in many knowledge extraction tasks due to the high variety of domains they cover. Unfortunately, many of these datasets do not provide a description for their properties and classes, reducing the users' freedom to understand, reuse or enrich them. This work attempts to fill part of this lack by presenting an unsupervised approach to discover syntactic patterns in the properties used in LD datasets. This approach produces a content patterns database generated from the textual data (content) of properties, which describes the syntactic structures that each property have. Our analysis enables (i) a human-understanding of syntactic patterns for properties in a LD dataset, and (ii) a structural description of properties that facilitates its reuse or extension. Results over DBpedia dataset also show that our approach enables (iii) the detection of data inconsistencies, and (iv) the validation and suggestion of new values for a property. We also outline how the resulting database can be exploited in several information extraction use cases.

URI

http://hdl.handle.net/10379/6022

Collections

Data Science Institute (Workshop Papers)

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Ireland