Scalable and Efficient Comparison-based Search without Features

Authors: Daniyar Chumbalov, Lucas Maystre, Matthias Grossglauser

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that the query complexity of our approach on two real-world datasets is on par with the non-blind setting, which is not achievable using any of the current state-of-the-art embedding methods. Finally, we demonstrate the efficacy of our framework by conducting an experiment with users searching for movie actors.
Researcher Affiliation Collaboration 1School of Computer and Communication Sciences, EPFL, Lausanne, Switzerland. 2Spotify.
Pseudocode Yes Algorithm 1 SAMPLEMIRROR
Open Source Code No The source code for our methods as well as the data collected in the face search experiment will be available at https://indy.epfl.ch/.
Open Datasets Yes on two real world datasets: the red wine dataset (Cortez et al., 2009) and the music dataset (Zhou et al., 2014).
Dataset Splits No The paper describes experimental setups and data sampling for runs, but does not specify explicit training, validation, or test dataset splits in terms of percentages or counts, or refer to standard predefined splits for the datasets used.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments, only general mentions of computational efficiency.
Software Dependencies No The paper does not list specific software components with their version numbers that would be needed to replicate the experiments.
Experiment Setup Yes During the search, comparison outcomes are sampled from the probit model (1), using a value of σ2 ε chosen such that approximately 10% of the queries outcomes are flipped. In the case when the true d is not known, we estimated it by first conservatively setting D = 100, and then, after collecting around 10000 triplets, picking the smallest value of D with 98% of the energy in the eigenvalue decomposition of the covariance matrix of the mean vectors ˆ Xµ. This number for both datasets was 20, and D = 20 was used as the approximation of the actual d. For each search, we first sampled T {1000, 3000, 5000, 10k, 20k, 40k} triplets, uniformly from the source 40k+ triplets, and then generated the embedding as described in Section 5.