Neurosymbolic visual reasoning with scene graph enrichment

Khan, Muhammad Jaleed

View/Open

PhD_Thesis_Corrected.pdf (19.43Mb)

Date

2024-02-26

Embargo Date

2025-02-28

Author

Khan, Muhammad Jaleed

Metadata

Show full item record

Usage

This item's downloads: 0 (view details)

Abstract

Visual reasoning is a critical component of artificial intelligence that aims to understand, interpret, and reason about complex visual content. It has an interdisciplinary nature incorporating visual feature extraction and image generation from computer vision, linguistic feature extraction and language generation from natural language processing, and graph-based representation and semantic enrichment from knowledge representation and reasoning. Data-centric visual reasoning techniques often face limitations in intuitively interpreting visual content due to the limited expressiveness and generalisability of scene representations. We propose a knowledge-enhanced neurosymbolic visual reasoning framework based on scene graph enrichment. This framework employs deep learning techniques for object detection and relationship prediction in visual content to generate scene graph representations, which are then refined and semantically enriched using common sense knowledge extracted from a heterogeneous knowledge graph. The enriched scene graphs are used in downstream visual reasoning tasks, including image captioning, visual question answering and image generation. A comprehensive experimental analysis on the standard datasets and evaluation benchmarks demonstrates considerable improvement over existing state-of-the-art methods in terms of relationship recall rate, image captioning quality, question answering accuracy and image generation realism. The encouraging results validate the effectiveness of leveraging heterogeneous common sense knowledge for enhanced scene understanding and visual reasoning.

URI

http://hdl.handle.net/10379/18065

Collections

University of Galway Theses (PhD Theses)

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Ireland