Now showing items 1-12 of 12

    • Automatic morphological analysis and interlinking of historical Irish cognate verb forms 

      Fransen, Theodorus (De Gruyter Mouton, 2020)
      The main aim of the author’s research project is to use computational approaches to gain more insight into the historical development of Irish verbs. One of the objectives is to investigate how a link between the electronic ...
    • A comparative study of different state-of-the-art hate speech detection methods in Hindi-English code-mixed data 

      Rani, Priya; Suryawanshi, Shardul; Goswami, Koustava; Chakravarthi, Bharathi Raja; Fransen, Theodorus; McCrae, John P. (European Language Resources Association (ELRA), 2020-05-11)
      Hate speech detection in social media communication has become one of the primary concerns to avoid conflicts and curb undesired activities. In an environment where multilingual speakers switch among multiple languages, ...
    • Corpus creation for sentiment analysis in code-mixed Tamil-English text 

      Chakravarthi, Bharathi Raja; Muralidaran, Vigneshwaran; Priyadharshini, Ruba; McCrae, John P. (European Language Resources Association (ELRA), 2020-05-11)
      Understanding the sentiment of a comment from a video or an image is an essential task in many applications. Sentiment analysis of a text can be useful for various decision-making processes. One such application is to ...
    • Cross-lingual sentence embedding using multi-task learning 

      Goswami, Koustava; Dutta, Sourav; Assem, Haytham; Fransen, Theodorus; McCrae, John P. (Association for Computational Linguistics, 2021-11-07)
      Multilingual sentence embeddings capture rich semantic information not only for measuring similarity between texts but also for catering to a broad range of downstream cross-lingual NLP tasks. State-of-the-art multilingual ...
    • Findings of the LoResMT 2021 shared task on COVID and sign language for low-resource languages 

      Ojha, Atul Kr.; Liu, Chao-Hong; Kann, Katharina; Ortega, John; Shatam, Sheetal; Fransen, Theodorus (Association for Machine Translation in the Americas, 2021-08)
      We present the findings of the LoResMT 2021 shared task which focuses on machine translation (MT) of COVID-19 data for both low-resource spoken and sign languages. The organization of this task was conducted as part of the ...
    • Historical data preservation and interpretation pipeline for Irish civil registration records 

      Beyan, Oya; Mealy, P. J.; Grant, Dolores; Grant, Rebecca; Harrower, Natalie; Breathnach, Ciara; Collins, Sandra; Decker, Stefan (Springer Verlag, 2015-10-28)
      Semantic Web technologies give us the opportunity to understand today's data-rich society and provide novel means to explore our past. Civil registration records such as birth, death, and marriage registers contain a vast ...
    • A multilingual evaluation dataset for monolingual word sense alignment 

      Ahmadi, Sina; McCrae, John P.; Nimb, Sanni; Khan, Fahad; Monachini, Monica; Pedersen, Bolette S.; Declerck, Thierry; Wissik, Tanja; Bellandi, Andrea; Pisani, Irene; Troelsgård, Thomas; Olsen, Sussi; Krek, Simon; Lipp, Veronika; Váradi, Tamás; Simon, László; Gyorffy, Andras; Tiberius, Carole; Schoonheim, Tanneke; Moshe, Yifat Ben; Rudich, Maya; Ahmad, Raya Abu; Lonke, Dorielle; Kovalenko, Kira; Langemets, Margit; Kallas, Jelena; Oksana, Dereza; Fransen, Theodorus; Cillessen, David; Lindemann, David; Alonso, Mikel; Salgado, Ana; Sancho, Jose Luis; Urena-Ruiz, Rafael-J.; Zamorano, Jordi Porta; Simov, Kiril; Osenova, Petya; Kancheva, Zara; Radev, Ivaylo; Stankovic, Ranka; Perdih, Andrej; Gabrovsek, Dejan (National University of Ireland Galway, 2020-05-16)
      Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually ...
    • NUIG-Panlingua-KMI Hindi-Marathi MT Systems for Similar Language Translation Task @ WMT 2020 

      Ojha, Atul Kr.; Rani, Priya; Bansal, Akanksha; Chakravarthi, Bharathi Raja; Kumar, Ritesh; McCrae, John P. (Association for Computational Linguistics, 2020-11-19)
      NUIG-Panlingua-KMI submission to WMT 2020 seeks to push the state-of-the-art in the Similar language translation task for the Hindi ↔ Marathi language pair. As part of these efforts, we conducted a series of experiments to ...
    • A sentiment analysis dataset for code-mixed Malayalam-English 

      Chakravarthi, Bharathi Raja; Jose, Navya; Suryawanshi, Shardul; Sherly, Elizabeth; McCrae, John P. (European Language Resources Association (ELRA), 2020-05-11)
      There is an increasing demand for sentiment analysis of text from social media which are mostly code-mixed. Systems trained on monolingual data fail for code-mixed data due to the complexity of mixing at different levels ...
    • Towards an integrative approach for making sense distinctions 

      McCrae, John P.; Fransen, Theodorus; Ahmadi, Sina; Buitelaar, Paul; Goswami, Koustava (Frontiers Media, 2022-02-07)
      Word senses are the fundamental unit of description in lexicography, yet it is rarely the case that different dictionaries reach any agreement on the number and definition of senses in a language. With the recent rise in ...
    • ULD@NUIG at SemEval-2020 Task 9: Generative morphemes with an attention model for sentiment analysis in code-mixed text 

      Goswami, Koustava; Rani, Priya; Chakravarthi, Bharathi Raja; Fransen, Theodorus; McCrae, John P. (International Committee for Computational Linguistics, 2020)
      Code mixing is a common phenomena in multilingual societies where people switch from one language to another for various reasons. Recent advances in public communication over different social media sites have led to an ...
    • Unsupervised deep language and dialect identification for short texts 

      Goswami, Koustava; Sarkar, Rajdeep; Chakravarthi, Bharathi Raja; Fransen, Theodorus; McCrae, John P. (International Committee on Computational Linguistics, 2020-12)
      Automatic Language Identification (LI) or Dialect Identification (DI) of short texts of closely related languages or dialects, is one of the primary steps in many natural language processing pipelines. Language identification ...