SemanTex: semantic text exploration using document links implied by conceptual networks extracted from the texts

Aldarra, Suad; Muñoz, Emir; Vandenbussche, Pierre-Yves; Nováček, Vít

View/Open

ISWC_2014_demo_semtex.pdf (362.0Kb)

Date

2014

Author

Aldarra, Suad

Muñoz, Emir

Vandenbussche, Pierre-Yves

Nováček, Vít

Metadata

Show full item record

Usage

This item's downloads: 339 (view details)

Recommended Citation

Suad Aldarra, Emir Muñoz, Pierre-Yves Vandenbussche, and Vít Nováček. 2014. SemanTex: semantic text exploration using document links implied by conceptual networks extracted from the text. In Proceedings of the 2014 International Conference on Posters & Demonstrations Track - Volume 1272 (ISWC-PD'14), Matthew Horridge, Marco Rospocher, and Jacco Van Ossenbruggen (Eds.), Vol. 1272. CEUR-WS.org, Aachen, Germany, Germany, 345-348.

Published Version

http://dl.acm.org/citation.cfm?id=2878453.2878540

Abstract

Despite of advances in digital document processing, exploration of implicit relationships within large amounts of textual resources can still be daunting. This is partly due to the ‘black-box’ nature of most current methods for computing links (i.e., similarities) between documents (c.f., [1] and [2]). The methods are mostly based on numeric computational models like vector spaces or probabilistic classifiers. Such models may perform well according to standard IR evaluation methodologies, but can be sub-optimal in applications aimed at end users due to the difficulties in interpreting the results and their provenance [3, 1]. Our Semantic Text Exploration prototype (abbreviated as SemanTex) aims at finding implicit links within a corpus of textual resources (such as articles or web pages) and exposing them to users in an intuitive front-end. We discover the links by: (1) finding concepts that are important in the corpus; (2) computing relationships between the concepts; (3) using the relationships for finding links between the texts. The links are annotated with the concepts from which the particular connection was computed. Apart of being presented to human users for manual exploration in the SemanTex interfaces, we are working on representing the semantically annotated links between textual documents in RDF and exposing the resulting datasets for particular domains (such as PubMed or New York Times articles) as a part of the Linked Open Data cloud.

URI

http://hdl.handle.net/10379/6017

Collections

Data Science Institute (Conference Papers)

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Ireland