A Weibull-count approach for handling under- and overdispersed longitudinal/clustered data structures
Ribeiro Jr, Eduardo E.
Demétrio, Clarice G. B.
MetadataShow full item record
This item's downloads: 276 (view details)
Cited 2 times in Scopus (view citations)
Luyts, Martial, Molenberghs, Geert, Verbeke, Geert, Matthijs, Koen, Ribeiro Jr, Eduardo E., Demétrio, Clarice G. B., & Hinde, John. (2018). A Weibull-count approach for handling under- and overdispersed longitudinal/clustered data structures. Statistical Modelling, 19(5), 569-589. doi: 10.1177/1471082X18789992
A Weibull-model-based approach is examined to handle under- and overdispersed discrete data in a hierarchical framework. This methodology was first introduced by Nakagawa and Osaki (1975, IEEE Transactions on Reliability, 24, 300–301), and later examined for under- and overdispersion by Klakattawi et al. (2018, Entropy, 20, 142) in the univariate case. Extensions to hierarchical approaches with under- and overdispersion were left unnoted, even though they can be obtained in a simple manner. This is of particular interest when analysing clustered/longitudinal data structures, where the underlying correlation structure is often more complex compared to cross-sectional studies. In this article, a random-effects extension of the Weibull-count model is proposed and applied to two motivating case studies, originating from the clinical and sociological research fields. A goodness-of-fit evaluation of the model is provided through a comparison of some well-known count models, that is, the negative binomial, Conway–Maxwell–Poisson and double Poisson models. Empirical results show that the proposed extension flexibly fits the data, more specifically, for heavy-tailed, zero-inflated, overdispersed and correlated count data. Discrete left-skewed time-to-event data structures are also flexibly modelled using the approach, with the ability to derive direct interpretations on the median scale, provided the complementary log–log link is used. Finally, a large simulated set of data is created to examine other characteristics such as computational ease and orthogonality properties of the model, with the conclusion that the approach behaves best for highly overdispersed cases.