Browsing Data Science Institute by Title

Combining lexical and spatial knowledge to predict spatial relations between objects in images

Hürlimann, Manuela; Bos, Johan (ACL Anthology, 2016-08-11)

Explicit representations of images are useful for linguistic applications related to images. We design a representation based on first-order models that capture the objects present in an image as well as their spatial ...

Community topic usage in social networks

Wood, Ian D. (ACM, 2015-10)

When studying large social media data sets, it is useful to reduce the dimensionality of both the network (e.g. by finding communities) and user-generated data such as text (e.g. using topic models). Algorithms exist for ...

A comparative study of different state-of-the-art hate speech detection methods in Hindi-English code-mixed data

Rani, Priya; Suryawanshi, Shardul; Goswami, Koustava; Chakravarthi, Bharathi Raja; Fransen, Theodorus; McCrae, John P. (European Language Resources Association (ELRA), 2020-05-11)

Hate speech detection in social media communication has become one of the primary concerns to avoid conflicts and curb undesired activities. In an environment where multilingual speakers switch among multiple languages, ...

A comparison of emotion annotation approaches for text

Wood, Ian D.; McCrae, John P.; Andryushechkin, Vladimir; Buitelaar, Paul (MDPI, 2018-05-11)

While the recognition of positive/negative sentiment in text is an established task with many standard data sets and well developed methodologies, the recognition of a more nuanced affect has received less attention: there ...

A comparison of emotion annotation schemes and a new annotated data set

Wood, Ian D.; McCrae, John P.; Andryushechkin, Vladimir; Buitelaar, Paul (European Languages Resources Association (ELRA), 2018-05-07)

While the recognition of positive/negative sentiment in text is an established task with many standard data sets and well developed methodologies, the recognition of more nuanced affect has received less attention, and ...

A comparison of statistical and neural machine translation for Slovene, Serbian and Croatian

Arcan, Mihael (Language Technologies and Digital Humanities 2018, 2018-09-20)

In this paper we present a comparison of translation quality using of Statistical Machine Translation (SMT) and Neural Machine Translation (NMT), considering translation directions between English, Slovene, Serbian and ...

Constructing Twitter Datasets using Signals for Event Detection Evaluation

Hromic, Hugo; Hayes, Conor (22nd International Conference on Case-Based Reasoning, 2014-09-29)

Twitter is a very attractive real-time platform for research on event detection. However, despite the great amount of interest, datasets suitable for evaluating such methods are not easily available. The two most important ...

A Content Analysis: How Wikipedia Talk Pages Are Used

Schneider, Jodi; Passant, Alexandre; Breslin, John G. (2010)

A Context Lifecycle For Web-Based Context Management Services

Hynes, Gearoid; Reynolds, Vinny; Hauswirth, Manfred (2009)

During the development of context aware applications a con- text management component must traditionally be created. This task re- quires specialist context lifecycle management expertise and hence can be a significant ...

Converging Web and Desktop Data with Konduit

Dragan, Laura; Möller, Knud; Handschuh, Siegfried; Ambrus, Oszkar (2009)

In this paper we present Konduit, a desktop-based platform for visual scripting with RDF data. Based on the idea of the semantic desktop, non-technical users can create, manipulate and mash-up RDF data with Konduit, and ...

A Conversation-oriented language for B2B integration based on Semantic Web Services

Gomez, Juan Miguel; Haller, Armin; Bussler, Christoph (2005)

Establishing conversations in a B2B environment has significantly eased since the advent of standards such as RosettaNet and ebXML. These standardisation efforts have maintained some flexibility in defining interactions ...

CORAAL - Towards Deep Exploitation of Textual Resources in Life Sciences

Nováček, Vít; Groza, Tudor; Handschuh, Siegfried (Springer Verlag, 2009)

Prominent biomedical literature search tools like ScienceDirect, PubMed Central or MEDLINE allow for efficient retrieval of resources based on key words. Due to vast amounts of data available in life sciences, key word ...

Corpus creation for sentiment analysis in code-mixed Tamil-English text

Chakravarthi, Bharathi Raja; Muralidaran, Vigneshwaran; Priyadharshini, Ruba; McCrae, John P. (European Language Resources Association (ELRA), 2020-05-11)

Understanding the sentiment of a comment from a video or an image is an essential task in many applications. Sentiment analysis of a text can be useful for various decision-making processes. One such application is to ...

A corpus of the Sorani Kurdish folkloric lyrics

Ahmadi, Sina; Hassani, Hossein; Abedi, Kamaladdin (National University of Ireland Galway, 2020-05-16)

Kurdish poetry and prose narratives were historically transmitted orally and less in a written form. Being an essential medium of oral narration and literature, Kurdish lyrics have had a unique attribute in becoming a ...

Cost-Aware Processing of Similarity Queries in Structured Overlays

Karnstedt, Marcel; Hauswirth, Manfred (2006)

Large-scale distributed data management with P2P systems requires the existence of similarity operators for queries as we cannot assume that all users will agree on exactly the same schema and value representations and ...

Creating a fine-grained corpus for a less-resourced language: the case of Kurdish

Omer Abdulrahman, Roshna; Hassani, Hossein; Ahmadi, Sina (NUI Galway, 2019-07-28)

Kurdish is a less-resourced language consisting of different dialects written in various scripts. Approximately 30 million people in different countries speak the language. The lack of corpora is one of the main obstacles ...

Creating a multilingual terminological resource using linked data:the case of archaeological domain in the Italian language

Carlino, Carola; Ahmadi, Sina; Speranza, Giulia (CEUR Workshop Proceedings, 2019-11-13)

The lack of multilingual terminological resources in specialized domains constitutes an obstacle to the access and reuse of information. In the technical domain of cultural heritage and, in particular, archaeology, such ...