Recent Submissions

  • CoFiF: A corpus of financial reports in French language 

    Ahmadi, Sina; Daudert, Tobias (NUI Galway, 2019-08-12)
    In an era when machine learning and artificial intelligence have huge momentum, the data demand to train and test models is steadily growing. We introduce CoFiF, the first corpus comprising company reports in the French ...
  • Creating a fine-grained corpus for a less-resourced language: the case of Kurdish 

    Omer Abdulrahman, Roshna; Hassani, Hossein; Ahmadi, Sina (NUI Galway, 2019-07-28)
    Kurdish is a less-resourced language consisting of different dialects written in various scripts. Approximately 30 million people in different countries speak the language. The lack of corpora is one of the main obstacles ...
  • NUIG at the FinSBD Task: Sentence boundary detection for noisy financial PDFs in English and French 

    Daudert, Tobias; Ahmadi, Sina (NUI Galway, 2019-08-12)
    Portable Document Format (PDF) has become the industry-standard document as it is independent of the software, hardware or operating system. Publicly listed companies annually publish a variety of reports and too take ...
  • Passive diagnosis incorporating the PHQ-4 for depression and anxiety 

    Delahunty, Fionn; Johansson, Robert; Mihael, Arcan (NUI Galway, 2019)
    Depression and anxiety are the two most prevalent mental health disorders worldwide, impacting the lives of millions of people each year. In this work, we develop and evaluate a multilabel, multidimensional deep neural ...
  • Leveraging rule-based machine translation knowledge for under-resourced neural machine translation models 

    Torregrosa, Daniel; Pasricha, Nivranshu; Chakravarth, Bharathi Raja; Masoud, Maraim; Alonso, Juan; Casas, Noe; Arcan, Mihael (NUI Galway, 2019-08-19)
    Rule-based machine translation is a machine translation paradigm where linguistic knowledge is encoded by an expert in the form of rules that translate from source to target language. While this approach grants total ...
  • Lexical sense alignment using weighted bipartite b-matching 

    Ahmadi, Sina; Arcan, Mihael; McCrae, John (NUI Galway, 2019-05-20)
    In this study, we present a similarity-based approach for lexical sense alignment in WordNet and Wiktionary with a focus on the polysemous items. Our approach relies on semantic textual similarity using features such as ...
  • On lexicographical networks 

    Ahmadi, Sina; Arcan, Mihael; McCrae, John (NUI Galway, 2018-12-06)
    In this study, we analyze various aspects of lexicographical networks. We would like to answer our research questions of what are the characteristics of the lexicographical networks? In addition to the existing notions of ...
  • Inferring translation candidates for multilingual dictionary generation with multi-way neural machine translation 

    Arcan, Mihael; Torregrosa, Daniel; Ahmadi, Sina; McCrae, John P. (National University of Ireland, Galway, 2019-05-20)
    In the widely-connected digital world, multilingual lexical resources are one of the most important resources, for natural language processing applications, including information retrieval, question answering or knowledge ...
  • TIAD 2019 Shared Task: Leveraging knowledge graphs with neural machine translation for automatic multilingual dictionary generation 

    Torregrosa, Daniel; Arcan, Mihael; Ahmadi, Sina; McCrae, John P. (National University of Ireland, Galway, 2019-04-20)
    This paper describes the different proposed approaches to the TIAD 2019 Shared Task, which consisted in the automatic discovery and generation of dictionaries leveraging multilingual knowledge bases. We present three methods ...
  • An evaluation of SPARQL federation engines over multiple endpoints 

    Saleem, Muhammad; Khan, Yasar; Hasnain, Ali; Ermilov, Ivan; Ngonga Ngomo, Axel-Cyrille (NUI Galway, 2018-04-23)
    Due to decentralized and linked architecture underlying Linking Data, running complex queries often require collecting data from multiple RDF datasets. The optimization of the runtime of such queries, called federated ...
  • Drug target discovery using knowledge graph embeddings 

    Mohamed, Sameh K.; Nováček, Vít; Nounu, Aayah (Association for Computing Machinery, 2019-04-08)
    The field of drug discovery has entered a plateau stage lately. It is increasingly more expensive and time-demanding to introduce new drugs into the market. One of the main reasons is the slow progress in finding novel ...
  • Link prediction using multi part embeddings 

    Mohamed, Sameh K.; Nováček, Vít (NUI Galway, 2019-06-02)
    Knowledge graph embeddings models are widely used to provide scalable and efficient link prediction for knowledge graphs. They use different techniques to model embeddings interactions, where their tensor factorisation ...
  • Knowledge base completion using distinct subgraph paths 

    Mohamed, Sameh K.; Nováček, Vít; Vandenbussche, Pierre-Yves (ACM, 2018-04-09)
    Graph feature models facilitate efficient and interpretable predictions of missing links in knowledge bases with network structure (i.e. knowledge graphs). However, existing graph feature models-e.g. Subgraph Feature ...
  • Extending largeRDFBench for multi-source data at scale for SPARQL endpoint federation 

    Hasnain, Ali; Saleem, Muhammad; Ngomo, Axel-Cyrille Ngonga; Rebholz-Schuhmann, Dietrich (IOS Press, 2018)
    Querying the Web of Data is highly motivated by the use of federation approaches mainly SPARQL query federation when the data is available through endpoints. Different benchmarks have been proposed to exploit the full ...
  • Avtomatsko pridobivanje besednih zvez iz korpusa z uporabo leksikona SSJ 

    Arhar Holdt, Špela; Arcan, Mihael (Centre for Slovene as a Second and Foreign Language, Univerity of Ljubljana, 2011-11-17)
    The field of computational lexicography is an interdisciplinary field, primarily focusing on the automatisation of lexicographic procedures and the building of lexical databases of various kinds. In this paper we describe ...
  • Deep convolution neural network model to predict relapse in breast cancer 

    Jha, Alokkumar; Verma, Ghanshyam; Khan, Yasar; Mehmood, Qaiser; Rebholz-Schuhmann, Dietrich; Sahay, Ratnesh (IEEE, 2018-12-17)
    A mishap in anti-cancer drug distribution is critical in breast cancer patients due to poor prediction model to identify the treatment regime in ER+ve and ER-ve (Estrogen Receptor (ER)) patients. The traditional method for ...
  • Linked data cased multi-omics integration and visualization for cancer decision networks 

    Jha, Alokkumar; Khan, Yasar; Mehmood, Qaiser; Rebholz-Schuhmann, Dietrich; Sahay, Ratnesh (Springer Verlag, 2018-12-30)
    Visualization of Gene Expression (GE) is a challenging task since the number of genes and their associations are difficult to predict in various set of biological studies. GE could be used to understand tissue-gene-protein ...
  • Engineering an aligned gold-standard corpus of human to machine oriented Controlled Natural Language 

    Hazem Safwat; Brian Davis; Manel Zarrouk (IEEE, 2018-12-03)
    Knowledge base creation and population are an essential formal backbone for a variety of intelligent applications, decision support and expert systems and intelligent search. While the abundance of unstructured text helps ...
  • SemR-11: a multi-lingual gold-standard for semantic similarity and relatedness for eleven languages 

    Barzegar, Siamak; Davis, Brian; Zarrouk, Manel; Handschuh, Siegfried; Freitas, André (European Language Resources Association, 2018-05-07)
    This work describes SemR-11, a multi-lingual dataset for evaluating semantic similarity and relatedness for 11 languages (German, French, Russian, Italian, Dutch, Chinese, Portuguese, Swedish, Spanish, Arabic and Persian). ...
  • WWW'18 open challenge: financial opinion mining and question answering 

    Maia, Macedo; Handschuh, Siegfried; Freitas, André; Davis, Brian; McDermott, Ross; Zarrouk, Manel; Balahur, Alexandra (Association for Computing Machinery, 2018-04-23)
    The growing maturity of Natural Language Processing (NLP) techniques and resources is dramatically changing the landscape of many application domains which are dependent on the analysis of unstructured data at scale. The ...

View more