HPC IO and seismic data performance optimization using ANNs prediction based auto-tuning
View/ Open
Date
2023-06-19Author
Tipu, Abdul Jabbar Saeed
Metadata
Show full item recordUsage
This item's downloads: 207 (view details)
Abstract
HPC or super-computing clusters are designed for executing computationally intensive
operations that typically involve large scale IO operations. This most commonly involves
using a standard MPI library implemented in C/C++.
The MPI-IO performance in HPC clusters tends to vary significantly over a range
of configuration parameters that are generally not considered by the algorithm. It is
commonly left to individual practitioners to optimise IO on a case-by-case basis at code
level. This can often lead to a range of unforeseen outcomes, specifically when it comes
to the manual tuning of the configuration parameters.
The ExSeisDat utility is built on top of the native MPI-IO library comprising of
Parallel IO and Workflow Libraries to process seismic data encapsulated in SEG-Y file
format. The SEG-Y File data structure is complex in nature, due to the alternative
arrangement of trace header and trace data. Its size scales to petabytes and the chances
of IO performance degradation are further increased by ExSeisDat.
The aim of this research is to auto-tune the Parallel IO configurations which determine
the gain in bandwidth performance, without requiring the user programmer’s interven tion. This research thesis presents a novel study of the changing IO performance in terms
of bandwidth, with the use of parallel plots against various MPI-IO, Lustre (Parallel)
File System and SEG-Y File configuration parameters.
Another novel aspect of this research is the predictive modelling of parallel (MPI)
IO, and ExSeisDat’s SEG-Y IO and file sorting bandwidth performance behaviour using
Artificial Neural Networks (ANNs). In continuation to this, the auto-tuning parameters
strategy is designed, which is based on the ANN models predictions. This is an innovative
approach to optimize Parallel IO bandwidth performance for MPI-IO, seismic (SEG-Y)
data IO and file sorting operations.
The results presented in the thesis, show significant performance gain through sta tistical analysis, in the Parallel IO bandwidth within the HPC cluster. Additionally,
this research has highlighted the common most useful configuration settings which give
the highest probability of improving the IO performance, in the event ML models are
unavailable for bandwidth predictions.
Furthermore, the different ANN hidden layer node configurations have also been dis cussed with respect to identified SEG-Y operations, for separate and combined READ-
/WRITE executions, which show maximum average bandwidth performances.