Show simple item record

dc.contributor.advisorHandschuh, Siegfried
dc.contributor.authorQasemiZadeh, Behrang
dc.description.abstractKnowledge is assumed by cognitive science to consist of concepts that are organised and maintained by complex processes taking place in human minds. These processes are not yet accessible directly. Language is still the primary medium for communicating knowledge and presumably linguistic objects and structures are expressions of knowledge and its organisation in mind. Collecting terms (i.e., creating a specialised vocabulary) and capturing their relationships are thus important mechanisms for distilling knowledge from specialised texts and for formalising it for machines. The approach taken in this thesis is to analyse the co-hyponymy relationships between terms as an organisational mechanism. Co-hyponyms are sets of lexical units sharing a common hypernym; bank and building society, for example, are co-hyponyms of the hypernym financial organisation. Analysing the co-hyponymy relationships between terms is important because it bridges the semantic gap between a) specialised lexical knowledge, b) the quantitative interpretation of meanings in specialised discourse, and c) machine-accessible conceptualisation of knowledge. This thesis proposes the use of a vector-based distributional representation of terms in order to construct a quantitative conceptual model of kinds-sorts in a given field of knowledge. Among empirical methods for analysing linguistic structures, distributional approaches to semantics encode language data to models that should correspond to the meanings of linguistic entities. The meaning of an entity, such as a word or a phrase, is assumed to be a function of its statistical distribution in contexts. In order to use these methods we thus need to define (a) the contexts, that is, which statistical information must be collected; and (b) the functions, that is, how this information must be used to correlate with a meaning. This thesis is a study of corpus-based distributional methods for characterising co-hyponymy between terms. Terms are represented as vectors to form a so-called term-space model. To obviate the curse of dimensionality and to facilitate the construction of models, novel methods employing sparse random projections are proposed. Random Manhattan indexing is used to construct L1-normed spaces and random indexing for L2-normed spaces. Following these steps a memory-based classifier exploits the distance between vectors to identify the presence of targeted co-hyponymy relationships. An evaluation is also performed to assess any reciprocal influences of the method's parameters on its performance. Userfriendliness, flexibility in updating and maintenance, and an innate capacity to resemble conceptual structures in a domain knowledge are the advantages of this method.en_US
dc.subjectNatural langauge processingen_US
dc.subjectDistributional semantic modelsen_US
dc.subjectComputational terminologyen_US
dc.subjectStatistical natural language processingen_US
dc.subjectRandom projectionsen_US
dc.subjectComputational linguisticsen_US
dc.subjectMachine learningen_US
dc.subjectInformation extractionen_US
dc.subjectData miningen_US
dc.subjectInsight Centre for Data Analyticsen_US
dc.titleInvestigating the use of distributional semantic models for co-hyponym identification in special corporaen_US
dc.contributor.funderScience Foundation Irelanden_US
dc.local.noteLinguistic objects and structures are expressions of knowledge and its organisation in mind. Collecting terms and capturing their relationships are thus important steps for distilling knowledge from specialised texts and for making it formalised for machines. This thesis proposes a method for extracting terms and modelling their meanings.en_US

Files in this item

Attribution-NonCommercial-NoDerivs 3.0 Ireland
This item is available under the Attribution-NonCommercial-NoDerivs 3.0 Ireland. No item may be reproduced for commercial purposes. Please refer to the publisher's URL where this is made available, or to notes contained in the item itself. Other terms may apply.

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record