Random Manhattan Integer Indexing: Incremental L1 Normed Vector Space Construction

QasemiZadeh, Behrang; Handschuh, Siegfried

View/Open

EMNLP2014178.pdf (287.9Kb)

Date

2014

Author

QasemiZadeh, Behrang

Handschuh, Siegfried

Metadata

Show full item record

Usage

This item's downloads: 400 (view details)

Recommended Citation

QasemiZadeh, Behrang; Handschuh, Siegfried (2014) Random Manhattan Integer Indexing: Incremental L1 Normed Vector Space Construction Empirical Methods in Natural Language Processing (EMNLP) Doha, Qatar, 2014-10-25- 2014-10-29

Published Version

http://emnlp2014.org/papers/pdf/EMNLP2014178.pdf

Abstract

Vector space models (VSMs) are mathematically well-defined frameworks that have been widely used in the distributional approaches to semantics. In VSMs, high-dimensional vectors represent linguistic entities. In an application, the similarity of vectors and thus the entities that they represent is computed by a distance formula. The high dimensionality of vectors, however, is a barrier to the performance of methods that employ VSMs. Consequently, a dimensionality reduction technique is employed to alleviate this problem. This paper introduces a novel technique called Random Manhattan Indexing (RMI) for the construction of L1 normed VSMs at reduced dimensionality. RMI combines the construction of a VSM and dimension reduction into an incremental and thus scalable two-step procedure. In order to attain its goal, RMI employs the sparse Cauchy random projections. We further introduce Random Manhattan Integer Indexing (RMII): a computationally enhanced version of RMI. As shown in the reported experiments, RMI and RMII can be used reliably to estimate the L1 distances between vectors in a vector space of low dimensionality.

URI

http://hdl.handle.net/10379/4650

Collections

Data Science Institute (Conference Papers)

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Ireland