Faster Binary Embeddings for Preserving Euclidean Distances

Authors: Jinjie Zhang, Rayan Saab

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To illustrate our results, we test the proposed method on natural images and show that it achieves strong performance." and "6 NUMERICAL EXPERIMENTS To illustrate the performance of our fast binary embedding (Algorithm 1) and ℓ2 distance recovery (Algorithm 2), we apply them to real-world datasets: Yelp open dataset, Image Net (Deng et al., 2009), Flickr30k (Plummer et al., 2017), and CIFAR-10 (Krizhevsky et al., 2010). All images are converted to grayscale and resampled using bicubic interpolation to size 128 × 128 for images from Yelp, Image Net, and Flickr30k and 32 × 32 for images from CIFAR-10. So, each can be represented by a 16384-dimensional or 1024-dimensional vector. The results are reported here and in Appendix A.
Researcher Affiliation Academia Jinjie Zhang & Rayan Saab Department of Mathematics, Halıcıo glu Data Science Institute University of California San Diego {jiz003, rsaab}@ucsd.edu
Pseudocode Yes Algorithm 1: Fast Binary Embedding for Finite T" and "Algorithm 2: ℓ2 Norm Distance Recovery
Open Source Code Yes The Python source code of our paper: https://github.com/jayzhang0727/Faster-Binary-Embeddings-for-Preserving-Euclidean-Distances.git
Open Datasets Yes We apply them to real-world datasets: Yelp open dataset, Image Net (Deng et al., 2009), Flickr30k (Plummer et al., 2017), and CIFAR-10 (Krizhevsky et al., 2010)." and "Yelp open dataset: https://www.yelp.com/dataset
Dataset Splits No To give a numerical illustration of the relation among the length m of the binary sequences, embedding dimension p, and order r, as compared to the upper bound in (15), we use both Method 1 and Method 2 on the Yelp dataset. We randomly sample k = 1000 images and scale them by the same constant so all data points are contained in the ℓ2 unit ball." The paper does not specify traditional train/validation/test splits for model training and evaluation.
Hardware Specification No The paper does not provide any specific hardware details such as GPU or CPU models used for running the experiments.
Software Dependencies No The paper mentions 'The Python source code' but does not specify any software dependencies with version numbers (e.g., specific Python libraries or frameworks).
Experiment Setup Yes Based on Theorem 4.2, we set n = 16384 and s = 1650/n ≈ 0.1. For each fixed p, we apply Algorithm 1 and Algorithm 2 for various m. We present our experimental results for stable Σ quantization schemes, given by (21), with r = 1 and r = 2 in Figure 1. For each dataset we randomly sample k = 1000 images and scale them such that all scaled data points are contained in the ℓ2 unit ball.