Multi-objective reinforcement learning and planning for the expected scalarised returns
Date
2023-01-13Author
Hayes, Conor F.
Metadata
Show full item recordUsage
This item's downloads: 71 (view details)
Abstract
Many problems in the real world have multiple, often conflicting, objectives. To
solve such problems a multi-objective approach to decision making must be taken.
In the multi-objective decision making (MODeM) literature, the utility-based
approach is followed where a utility function is used to model the preferences over
the objectives of a human decision maker (or user). If the utility function is known
a priori a single optimal solution can be computed. However, if the utility function
is unknown or uncertain, a set of optimal solutions must be computed.
When following the utility-based approach, multiple optimality criteria can arise.
In scenarios where the utility function of a user is derived from multiple executions
of a policy, the scalarised expected returns (SER) must be optimised. In scenarios
where the utility of a user is derived from a single execution of a policy, the expected
scalarised returns (ESR) criterion must be optimised. In the MODeM literature,
the SER criterion has been studied extensively, while the ESR criterion has largely
been ignored. In the real world, a user may only have a single opportunity to make
a decision. For example, in a medical setting, a patient may only have one chance
to select a treatment. Therefore, in order to effectively apply MODeM algorithms
to a range of practical applications, the ESR criterion must be further investigated.
This thesis contains a number of important contributions. It is demonstrated by
example that for ESR settings where the utility function is known and nonlinear,
multi-objective methods that compute policies must be explicitly designed for the
ESR criterion. For settings where the utility function of a user is unknown, it
is shown that expected value vectors are not sufficient to determine optimality
under the ESR criterion. Therefore, to determine a partial ordering over policies,
new methods to compute sets of optimal policies are proposed. Finally, this thesis
proposes a number of new multi-objective algorithms that can compute sets of
optimal policies for the ESR criterion in various MODeM settings.