Multi-objective reinforcement learning and planning for the expected scalarised returns

Hayes, Conor F.

View/Open

conorfhayes_10354355_phd_thesis.pdf (20.41Mb)

Date

2023-01-13

Author

Hayes, Conor F.

Metadata

Show full item record

Usage

This item's downloads: 71 (view details)

Abstract

Many problems in the real world have multiple, often conflicting, objectives. To solve such problems a multi-objective approach to decision making must be taken. In the multi-objective decision making (MODeM) literature, the utility-based approach is followed where a utility function is used to model the preferences over the objectives of a human decision maker (or user). If the utility function is known a priori a single optimal solution can be computed. However, if the utility function is unknown or uncertain, a set of optimal solutions must be computed. When following the utility-based approach, multiple optimality criteria can arise. In scenarios where the utility function of a user is derived from multiple executions of a policy, the scalarised expected returns (SER) must be optimised. In scenarios where the utility of a user is derived from a single execution of a policy, the expected scalarised returns (ESR) criterion must be optimised. In the MODeM literature, the SER criterion has been studied extensively, while the ESR criterion has largely been ignored. In the real world, a user may only have a single opportunity to make a decision. For example, in a medical setting, a patient may only have one chance to select a treatment. Therefore, in order to effectively apply MODeM algorithms to a range of practical applications, the ESR criterion must be further investigated. This thesis contains a number of important contributions. It is demonstrated by example that for ESR settings where the utility function is known and nonlinear, multi-objective methods that compute policies must be explicitly designed for the ESR criterion. For settings where the utility function of a user is unknown, it is shown that expected value vectors are not sufficient to determine optimality under the ESR criterion. Therefore, to determine a partial ordering over policies, new methods to compute sets of optimal policies are proposed. Finally, this thesis proposes a number of new multi-objective algorithms that can compute sets of optimal policies for the ESR criterion in various MODeM settings.

URI

http://hdl.handle.net/10379/17625

Collections

University of Galway Theses (PhD Theses)

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Ireland