Now showing items 1-6 of 6

    • The ACL RD-TEC: A Dataset for Benchmarking Terminology Extraction and Classification in Computational Linguistics 

      QasemiZadeh, Behrang; Handschuh, Siegfried (2014)
      This paper introduces ACL RD-TEC: a dataset for evaluating the extraction and classification of terms from literature in the domain of computational linguistics. The dataset is derived from the Association for Computational ...
    • Random Manhattan Integer Indexing: Incremental L1 Normed Vector Space Construction 

      QasemiZadeh, Behrang; Handschuh, Siegfried (2014)
      Vector space models (VSMs) are mathematically well-defined frameworks that have been widely used in the distributional approaches to semantics. In VSMs, high-dimensional vectors represent linguistic entities. In an ...
    • A survey of current datasets for code-switching research 

      Jose, Navya; Chakravarthi, Bharathi Raja; Suryawanshi, Shardul; Sherly, Elizabeth; McCrae, John P. (IEEE, 2020-03-06)
      Code switching is a prevalent phenomenon in the multilingual community and social media interaction. In the past ten years, we have witnessed an explosion of code switched data in the social media that brings together ...
    • A term extraction approach to survey analysis in health care 

      Robin, Cécile; Isazad Mashinchi, Mona; Ahmadi Zeleti, Fatemeh; Ojo, Adegboyega; Buitelaar, Paul (European Language Resources Association, 2020-05)
      The voice of the customer has for a long time been a key focus of businesses in all domains. It has received a lot of attention from the research community in Natural Language Processing (NLP) resulting in many approaches ...
    • Towards automatic linking of lexicographic data: the case of a historical and a modern Danish dictionary 

      Ahmadi, Sina; Nimb, Sanni; McCrae, John P.; Sørensen, Nicolai H. (European Association for Lexicography, 2020)
      Given the diversity of lexical-semantic resources, particularly dictionaries, integrating such resources by aligning various types of information is an important task, both in e-lexicography and natural language processing. ...
    • Unsupervised deep language and dialect identification for short texts 

      Goswami, Koustava; Sarkar, Rajdeep; Chakravarthi, Bharathi Raja; Fransen, Theodorus; McCrae, John P. (International Committee on Computational Linguistics, 2020-12)
      Automatic Language Identification (LI) or Dialect Identification (DI) of short texts of closely related languages or dialects, is one of the primary steps in many natural language processing pipelines. Language identification ...