Neurosymbolic visual reasoning with scene graph enrichment

Khan, Muhammad Jaleed

dc.contributor.advisor	Curry, Edward
dc.contributor.advisor	Breslin, John
dc.contributor.author	Khan, Muhammad Jaleed
dc.date.accessioned	2024-02-26T11:35:00Z
dc.date.issued	2024-02-26
dc.identifier.uri	http://hdl.handle.net/10379/18065
dc.description.abstract	Visual reasoning is a critical component of artificial intelligence that aims to understand, interpret, and reason about complex visual content. It has an interdisciplinary nature incorporating visual feature extraction and image generation from computer vision, linguistic feature extraction and language generation from natural language processing, and graph-based representation and semantic enrichment from knowledge representation and reasoning. Data-centric visual reasoning techniques often face limitations in intuitively interpreting visual content due to the limited expressiveness and generalisability of scene representations. We propose a knowledge-enhanced neurosymbolic visual reasoning framework based on scene graph enrichment. This framework employs deep learning techniques for object detection and relationship prediction in visual content to generate scene graph representations, which are then refined and semantically enriched using common sense knowledge extracted from a heterogeneous knowledge graph. The enriched scene graphs are used in downstream visual reasoning tasks, including image captioning, visual question answering and image generation. A comprehensive experimental analysis on the standard datasets and evaluation benchmarks demonstrates considerable improvement over existing state-of-the-art methods in terms of relationship recall rate, image captioning quality, question answering accuracy and image generation realism. The encouraging results validate the effectiveness of leveraging heterogeneous common sense knowledge for enhanced scene understanding and visual reasoning.	en_IE
dc.publisher	NUI Galway
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 Ireland
dc.rights	CC BY-NC-ND 3.0 IE
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/3.0/ie/
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/3.0/ie/
dc.subject	Science and Engineering	en_IE
dc.subject	Computer Science	en_IE
dc.subject	Data Science	en_IE
dc.subject	Artificial Intelligence	en_IE
dc.subject	scene representation	en_IE
dc.subject	scene graph	en_IE
dc.subject	semantic enrichment	en_IE
dc.subject	common sense knowledge	en_IE
dc.subject	visual reasoning	en_IE
dc.subject	visual question answering	en_IE
dc.subject	image captioning	en_IE
dc.subject	neurosymbolic integration	en_IE
dc.subject	deep learning	en_IE
dc.subject	knowledge graphs	en_IE
dc.title	Neurosymbolic visual reasoning with scene graph enrichment	en_IE
dc.type	Thesis	en
dc.description.embargo	2025-02-28
dc.local.final	Yes	en_IE
nui.item.downloads	0

Files in this item

Name:: license.txt
Size:: 5.659Kb
Format:: Text file

View/Open

Name:: PhD_Thesis_Corrected.pdf
Size:: 19.43Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

University of Galway Theses (PhD Theses)

Show simple item record

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Ireland