dc.contributor.advisor | Davis, Brian | |
dc.contributor.author | Abdelaal, Hazem | |
dc.date.accessioned | 2019-10-08T12:34:42Z | |
dc.date.available | 2019-10-08T12:34:42Z | |
dc.date.issued | 2019-10-08 | |
dc.identifier.uri | http://hdl.handle.net/10379/15492 | |
dc.description.abstract | Knowledge base creation and population are an essential formal backbone
for a variety of intelligent applications, decision support and expert systems
and intelligent search. Although knowledge extraction from unstructured
text offers a means of easing the knowledge acquisition process, the ambiguous
nature of language tends to impact on accuracy when engaging in
more complex semantic analysis.
Controlled Natural Languages (CNLs) are subsets of natural language
which are restricted grammatically in order to reduce or eliminate ambiguity
for the purposes of machine understanding, or unambiguous human
communication within a domain or industry context, such as Simplified
English. Moreover, CNLs help engaging non-expert users with no background
in knowledge engineering, as these languages offer user-friendly
interfaces that are easier to understand and accepted by users. The latter
type of human-oriented CNL is under-researched despite having found favor
in industry over many years.
Rewriting such human-oriented CNL content into a machine-oriented
CNL could potentially unlock significant silos of implicit valuable general
purpose domain knowledge. In this thesis, we have a developed an approach
for a series of corpus based rewriting rules for subsequent knowledge
capture. Our work confirms that a substantial amount of human-oriented
CNL content can be easily translated into a machine processable CNL
for formal knowledge capture with little semantic loss. In addition, we
describe a novel dataset which aligns a representative sample of Simplified
English Wikipedia sentences with a well known machine-oriented CNL.
This linguistic resource is both human-readable and semantically machine
interpretable, where it can be used by the community as a gold-standard
dataset which can benefit a variety of language processing and knowledge
based applications. | en_IE |
dc.publisher | NUI Galway | |
dc.rights | Attribution-NonCommercial-NoDerivs 3.0 Ireland | |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-nd/3.0/ie/ | |
dc.subject | Natural Language Processing | en_IE |
dc.subject | Knowledge Extraction | en_IE |
dc.subject | Controlled Natural Language | en_IE |
dc.subject | Semantic Web | en_IE |
dc.subject | Engineering and Informatics | en_IE |
dc.title | Knowledge extraction from simplified natural language text | en_IE |
dc.type | Thesis | en |
dc.contributor.funder | Science Foundation Ireland | en_IE |
dc.local.final | Yes | en_IE |
dcterms.project | info:eu-repo/grantAgreement/SFI/SFI Research Centres/12/RC/2289/IE/INSIGHT - Irelands Big Data and Analytics Research Centre/ | en_IE |
nui.item.downloads | 4552 | |