Towards inherently adaptive first person shooter agents using reinforcement learning

Glavin, Frank G.

View/Open

Glavin2015.pdf (9.124Mb)

Date

2015-09-30

Author

Glavin, Frank G.

Metadata

Show full item record

Usage

This item's downloads: 6948 (view details)

Abstract

Reinforcement learning (RL) is a paradigm which involves an agent interacting with an environment. The agent carries out actions in the environment and receives positive reinforcement for actions that are deemed “good” and penalties for “bad” actions based on a reward signal. The goal of the learning agent is to maximise the amount of reward it receives over time. This thesis presents several new behavioural architectures for controlling non-player characters (NPCs) in a modern first-person shooter (FPS) game using reinforcement learning. NPCs are computer-controlled players that are traditionally programmed with scripted, deterministic behaviours. We propose the use of reinforcement learning to enable the NPC to learn its own strategies and adapt them over time. We hypothesise that this will lead to greater variation in gameplay and produce less predictable NPCs. The first contribution of this thesis is the design, development and testing of two general purpose Deathmatch behavioural architectures called Sarsa-Bot and DRE-Bot. These architectures use reinforcement learning to control and adapt their behaviour. We demonstrated that they could learn to play competently and achieve good performance against fix-strategy scripted opponents. Our second contribution is the development of a reinforcement learning architecture, called RL-Shooter, specifically for the task of shooting. The opponent's movements are read in real-time and the agent chooses shooting actions based on those that caused the most damage to the opponent in the past. We carried out extensive experimentation that showed that the RL-Shooter architecture could produce varied gameplay, however, there was not a clear upward trend in performance over time. This led to our third contribution which involved developing extensions to the SARSA(λ) algorithm called Periodic Cluster-Weighted Rewarding and Persistent Action Selection. We designed these to improve the learning performance of RL-Shooter and we demonstrated that the use of the techniques resulted in a clear upward trend in the percentage hit accuracy achieved over time. Our final contribution is a skill-balancing mechanism that we developed, called Skilled Experience Catalogue, which is based on a by-product of the learning process. The agent systematically stores “snapshots” of what it has learned during the different stages of the learning process. These can then be loaded during the game in an attempt to closely match the abilities of the current opponent. We showed that the technique could successfully match the skill level of five different scripted opponents with varying difficulty settings.

URI

http://hdl.handle.net/10379/5500

Collections

University of Galway Theses (PhD Theses)

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Ireland