A Learning Architecture to Support Autonomous Resource Scheduling and Allocation in the Cloud
MetadataShow full item record
This item's downloads: 10556 (view details)
The advent of on-demand computing facilitated by computational clouds, provides an almost unlimited resource supply to support the execution of applications and processes. Through a process known as virtualisation large server machines are divided up into smaller units known as virtual machines. These virtual machines can then be procured on demand to support application deployments, workflow executions and service delivery via the cloud. However, optimally allocating these virtual resources to support a given application deployment or workflow execution on a cloud platform presents a number of significant research challenges. Virtualisation is enabled through a domain level hypervisor which controls access to the shared hardware amongst the competing virtual machines. Switching between domains and attempting to distribute access to these shared mediums is non-trivial and causes performance interference effects amongst the virtual machines. This presents a challenge when attempting to plan a resource allocation to support a given application or workflow running in these environments. Removing these interference effects entirely, is a very difficult problem and is one of the principle challenges facing virtualisation research in the coming years. However from a resource planning perspective it is possible to reason over these variabilities to achieve a near optimal resource allocation which satisfies the defined objective criteria. Markov Decision Processes provide a decision theoretic framework which facilitates planning and scheduling under uncertainty. By modeling the allocation of resources under this framework and solving using techniques such as reinforcement learning and dynamic programming this thesis provides a learning architecture to allocate/schedule resources adhering to defined optimisation criteria. Using data from real cloud deployments we empirically evaluate our proposed solutions with respect to two different application types. The first is a workflow application deployment where the requirement is to schedule tasks to resources to ensure that both cost and makespan constraints are achieved. The second is an application scaling problem where the goal is to optimise application response time at a minimum cost for varying numbers of user requests. For both of these problems the underlying resource is variable and changes accordingly. We present a number of novel advancements from both a learning and optimisation perspective.