Improving spectral library search by redefining similarity measures
Enright, Catherine G.
Madden, Michael G.
MetadataShow full item record
This item's downloads: 507 (view details)
Ankita Garg, Catherine G. Enright, Michael G. Madden (2015) 'Improving Spectral Library Search by Redefining Similarity Measures'. Journal Of Chemical Information And Modeling, 55 (5):963-971.
Similarity plays a central role in spectral library search. The goal of spectral library search is to identify those spectra in a reference library of known materials that most closely match an unknown query spectrum, on the assumption that this will allow us to identify the main constituent(s) of the query spectrum. The similarity measures used for this task in software and the academic literature are almost exclusively metrics, meaning that the measures obey the three axioms of metrics: (1) minimality; (2) symmetry; (3) triangle inequality. Consequently, they implicitly assume that the query spectrum is drawn from the same distribution as that of the reference library.In this paper, we demonstrate that this assumption is not necessary in practical spectral library search and that in fact it is often violated in practice. Although the reference library may be constructed carefully, it is generally impossible to guarantee that all future query spectra will be drawn from the same distribution as the reference library. Before evaluating different similarity measures, we need to understand how they define the relationship between spectra.In spectral library search, we often aim to find the constituent(s) of a mixture. We propose that rather than asking which reference library spectra are similar to the mixture, we should ask which of the reference library spectra are contained in the given query mixture. This question is inherently asymmetric. Therefore, we should adopt a nonmetric measure. To evaluate our hypothesis, we apply a nonmetric measure formulated by Tversky known as the Contrast Model and compare its performance to the well-known Jaccard similarity index metric on spectroscopic data sets. Our results show that the Tversky similarity measure yields better results than the Jaccard index.
This item is available under the Attribution-NonCommercial-NoDerivs 3.0 Ireland. No item may be reproduced for commercial purposes. Please refer to the publisher's URL where this is made available, or to notes contained in the item itself. Other terms may apply.
The following license files are associated with this item: