Integration of Genetic Biomarkers in Prognostic Models for Breast Cancer Survival
MetadataShow full item record
This item's downloads: 1917 (view details)
The main aim of my PhD is to create a prognostic model for invasive breast cancer patients for disease recurrence and death. The data were collected retrospectively and are comprised of 647 invasive breast cancer patients with patient characteristics and genetic markers measured. An additional complexity exists due to the presence of missing data. A complete case analysis with both clinical and pathological biomarkers reduces the number of cases to 103 patients. A major challenge is how best to build a prognostic model for breast cancer in the presence of missing data. The Kaplan Meier estimate of the survival function is the most commonly used method for the representation of the distribution of survival times. Extensions to graphical comparisons of these survival estimates were developed. Classical approaches to modelling survival data using complete case analysis are examined and then an empirical simulation study is used to examine the effect of missing data on variable selection and to compare the performance of variable selection techniques in imputed data. The final model identified Bilateral, Lymph Node status, Mitotic Count, Metastasis and UICC staging as being good predictors of Disease Free Survival and a subset of these for Overall Survival (Mitotic Count, Metastasis and UICC staging). These models have good concordance and were calibrated both internally and externally. Classification and Regression Trees (CART) are a non-parametric approach to regression modelling. The main feature of CART is the data are recursively partitioned into groups and a simple prediction model fitted to each partition. A novel approach using surrogate splits to create alternative competing trees with comparable prediction power are introduced. This helps identify underlying structure in the data.