SemanTex: semantic text exploration using document links implied by conceptual networks extracted from the texts
View/ Open
Date
2014Author
Aldarra, Suad
Muñoz, Emir
Vandenbussche, Pierre-Yves
Nováček, Vít
Metadata
Show full item recordUsage
This item's downloads: 339 (view details)
Recommended Citation
Suad Aldarra, Emir Muñoz, Pierre-Yves Vandenbussche, and Vít Nováček. 2014. SemanTex: semantic text exploration using document links implied by conceptual networks extracted from the text. In Proceedings of the 2014 International Conference on Posters & Demonstrations Track - Volume 1272 (ISWC-PD'14), Matthew Horridge, Marco Rospocher, and Jacco Van Ossenbruggen (Eds.), Vol. 1272. CEUR-WS.org, Aachen, Germany, Germany, 345-348.
Published Version
Abstract
Despite of advances in digital document processing, exploration of implicit relationships
within large amounts of textual resources can still be daunting. This
is partly due to the ‘black-box’ nature of most current methods for computing
links (i.e., similarities) between documents (c.f., [1] and [2]). The methods are
mostly based on numeric computational models like vector spaces or probabilistic
classifiers. Such models may perform well according to standard IR evaluation
methodologies, but can be sub-optimal in applications aimed at end users due
to the difficulties in interpreting the results and their provenance [3, 1].
Our Semantic Text Exploration prototype (abbreviated as SemanTex) aims
at finding implicit links within a corpus of textual resources (such as articles or
web pages) and exposing them to users in an intuitive front-end. We discover the
links by: (1) finding concepts that are important in the corpus; (2) computing
relationships between the concepts; (3) using the relationships for finding links
between the texts. The links are annotated with the concepts from which the
particular connection was computed. Apart of being presented to human users
for manual exploration in the SemanTex interfaces, we are working on representing
the semantically annotated links between textual documents in RDF
and exposing the resulting datasets for particular domains (such as PubMed or
New York Times articles) as a part of the Linked Open Data cloud.