Faster Binary Embeddings for Preserving Euclidean Distances
Authors: Jinjie Zhang, Rayan Saab
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To illustrate our results, we test the proposed method on natural images and show that it achieves strong performance." and "6 NUMERICAL EXPERIMENTS To illustrate the performance of our fast binary embedding (Algorithm 1) and ℓ2 distance recovery (Algorithm 2), we apply them to real-world datasets: Yelp open dataset, Image Net (Deng et al., 2009), Flickr30k (Plummer et al., 2017), and CIFAR-10 (Krizhevsky et al., 2010). All images are converted to grayscale and resampled using bicubic interpolation to size 128 × 128 for images from Yelp, Image Net, and Flickr30k and 32 × 32 for images from CIFAR-10. So, each can be represented by a 16384-dimensional or 1024-dimensional vector. The results are reported here and in Appendix A. |
| Researcher Affiliation | Academia | Jinjie Zhang & Rayan Saab Department of Mathematics, Halıcıo glu Data Science Institute University of California San Diego {jiz003, rsaab}@ucsd.edu |
| Pseudocode | Yes | Algorithm 1: Fast Binary Embedding for Finite T" and "Algorithm 2: ℓ2 Norm Distance Recovery |
| Open Source Code | Yes | The Python source code of our paper: https://github.com/jayzhang0727/Faster-Binary-Embeddings-for-Preserving-Euclidean-Distances.git |
| Open Datasets | Yes | We apply them to real-world datasets: Yelp open dataset, Image Net (Deng et al., 2009), Flickr30k (Plummer et al., 2017), and CIFAR-10 (Krizhevsky et al., 2010)." and "Yelp open dataset: https://www.yelp.com/dataset |
| Dataset Splits | No | To give a numerical illustration of the relation among the length m of the binary sequences, embedding dimension p, and order r, as compared to the upper bound in (15), we use both Method 1 and Method 2 on the Yelp dataset. We randomly sample k = 1000 images and scale them by the same constant so all data points are contained in the ℓ2 unit ball." The paper does not specify traditional train/validation/test splits for model training and evaluation. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models used for running the experiments. |
| Software Dependencies | No | The paper mentions 'The Python source code' but does not specify any software dependencies with version numbers (e.g., specific Python libraries or frameworks). |
| Experiment Setup | Yes | Based on Theorem 4.2, we set n = 16384 and s = 1650/n ≈ 0.1. For each fixed p, we apply Algorithm 1 and Algorithm 2 for various m. We present our experimental results for stable Σ quantization schemes, given by (21), with r = 1 and r = 2 in Figure 1. For each dataset we randomly sample k = 1000 images and scale them such that all scaled data points are contained in the ℓ2 unit ball. |