Learning to Hash Robustly, Guaranteed

Authors: Alexandr Andoni, Daniel Beaglehole

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On the practical side, we run experiments that show that our algorithm has a 1.8x and 2.1x better recall on the worstperforming queries to the MNIST and Image Net datasets.
Researcher Affiliation Academia 1Computer Science and Engineering, UCSD, La Jolla, California 2Department of Computer Science, Columbia University, New York, New York.
Pseudocode Yes Algorithm 1 Build Tree, Algorithm 2 Query Tree, Algorithm 3 Min-Max Optimization
Open Source Code No A link to the experiment code repository can be found at (https://anonymous.4open.science/r/instance-optimal-lsh-51DF/README.md).
Open Datasets Yes We demonstrate the practicality and performance of our algorithm on the canonical Image Net and MNIST datasets. In this section, we display results for the first 750 images of MNIST s training dataset (Chris Burges, 2021), and on the first 624 images of Image Net s 3x8x8 validation subset (Deng et al., 2009).
Dataset Splits No No explicit details on specific train/validation/test dataset splits (e.g., percentages, sample counts, or explicit split files) were provided for their experiments.
Hardware Specification Yes The experiments were performed on an Intel(R) Xeon(R) W-2155 CPU @ 3.30GHz with 65 GB of RAM (all 20 physical cores were used for the experiment). Query times for the small subsets were measured on a 2.3 GHz Dual-Core Intel Core i5 with 8GB of RAM.
Software Dependencies Yes The experiments were implemented in C++, and compiled with g++-5 using the -march=native and -O3 flags for improved runtime. In addition, our implementation was highly parallelized using Open MP pre-proccessor directives. Efficient matrix/vector computation was done with the Eigen library for C++ (Guennebaud et al., 2010).
Experiment Setup Yes For both MNIST and Image Net, the dataset was binarized using a threshold. In particular, all pixel values below a threshold pixel value were set to 0, and the complement is set to 1 (a threshold of 16 for Image Net, and 1 for MNIST)... For the small subsets, we ran our algorithm with radius r = 5 for Image Net and MNIST. Two additional parameters are listed for the experiments the number of rounds T the game was executed for, and the base β (0, 1) used for MWU.