Random Manhattan Integer Indexing: Incremental L1 Normed Vector Space Construction
MetadataShow full item record
This item's downloads: 354 (view details)
QasemiZadeh, Behrang; Handschuh, Siegfried (2014) Random Manhattan Integer Indexing: Incremental L1 Normed Vector Space Construction Empirical Methods in Natural Language Processing (EMNLP) Doha, Qatar, 2014-10-25- 2014-10-29
Vector space models (VSMs) are mathematically well-defined frameworks that have been widely used in the distributional approaches to semantics. In VSMs, high-dimensional vectors represent linguistic entities. In an application, the similarity of vectors and thus the entities that they represent is computed by a distance formula. The high dimensionality of vectors, however, is a barrier to the performance of methods that employ VSMs. Consequently, a dimensionality reduction technique is employed to alleviate this problem. This paper introduces a novel technique called Random Manhattan Indexing (RMI) for the construction of L1 normed VSMs at reduced dimensionality. RMI combines the construction of a VSM and dimension reduction into an incremental and thus scalable two-step procedure. In order to attain its goal, RMI employs the sparse Cauchy random projections. We further introduce Random Manhattan Integer Indexing (RMII): a computationally enhanced version of RMI. As shown in the reported experiments, RMI and RMII can be used reliably to estimate the L1 distances between vectors in a vector space of low dimensionality.