Learning to Hash Robustly, Guaranteed
Authors: Alexandr Andoni, Daniel Beaglehole
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On the practical side, we run experiments that show that our algorithm has a 1.8x and 2.1x better recall on the worstperforming queries to the MNIST and Image Net datasets. |
| Researcher Affiliation | Academia | 1Computer Science and Engineering, UCSD, La Jolla, California 2Department of Computer Science, Columbia University, New York, New York. |
| Pseudocode | Yes | Algorithm 1 Build Tree, Algorithm 2 Query Tree, Algorithm 3 Min-Max Optimization |
| Open Source Code | No | A link to the experiment code repository can be found at (https://anonymous.4open.science/r/instance-optimal-lsh-51DF/README.md). |
| Open Datasets | Yes | We demonstrate the practicality and performance of our algorithm on the canonical Image Net and MNIST datasets. In this section, we display results for the first 750 images of MNIST s training dataset (Chris Burges, 2021), and on the first 624 images of Image Net s 3x8x8 validation subset (Deng et al., 2009). |
| Dataset Splits | No | No explicit details on specific train/validation/test dataset splits (e.g., percentages, sample counts, or explicit split files) were provided for their experiments. |
| Hardware Specification | Yes | The experiments were performed on an Intel(R) Xeon(R) W-2155 CPU @ 3.30GHz with 65 GB of RAM (all 20 physical cores were used for the experiment). Query times for the small subsets were measured on a 2.3 GHz Dual-Core Intel Core i5 with 8GB of RAM. |
| Software Dependencies | Yes | The experiments were implemented in C++, and compiled with g++-5 using the -march=native and -O3 flags for improved runtime. In addition, our implementation was highly parallelized using Open MP pre-proccessor directives. Efficient matrix/vector computation was done with the Eigen library for C++ (Guennebaud et al., 2010). |
| Experiment Setup | Yes | For both MNIST and Image Net, the dataset was binarized using a threshold. In particular, all pixel values below a threshold pixel value were set to 0, and the complement is set to 1 (a threshold of 16 for Image Net, and 1 for MNIST)... For the small subsets, we ran our algorithm with radius r = 5 for Image Net and MNIST. Two additional parameters are listed for the experiments the number of rounds T the game was executed for, and the base β (0, 1) used for MWU. |