The Effect of Principal Component Analysis on Machine Learning Accuracy with High Dimensional Spectral Data
Ryder, Alan G.
O Connell, Marie-Louise
Madden, Michael G.
MetadataShow full item record
This item's downloads: 916 (view details)
"The Effect of Principal Component Analysis on Machine Learning Accuracy with High Dimensional Spectral Data" , Tom Howley, Michael G. Madden, Marie-Louise O Connell and Alan G Ryder. Proceedings of AI-2005, 25th International Conference on Innovative Techniques and Applications of Artificial Intelligence, Cambridge, Dec 2005.
The classi¿cation of high dimensional data, such as images, gene-expression data and spectral data, poses an interesting challenge to machine learning, as the presence of high numbers of redundant or highly correlated attributes can seriously degrade classification accuracy. This paper investigates the use of Principal Component Analysis (PCA) to reduce high dimensional data and to improve the predictive performance of some well known machine learning methods. Experiments are carried out on a high dimensional spectral dataset, in which the task is to identify a target material within a mixture. These experiments employ the NIPALS (Non-Linear Iterative Partial Least Squares) PCA method, a method that has been used in the field of chemometrics for spectral classification, and is a more efficient alternative than the widely used eigenvector decomposition approach. The experiments show that the use of this PCA method can improve the performance of machine learning in the classification of high dimensionsal data.