reproducibilityindex.ai

Scalable and Efficient Comparison-based Search without Features

Authors: Daniyar Chumbalov, Lucas Maystre, Matthias Grossglauser

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that the query complexity of our approach on two real-world datasets is on par with the non-blind setting, which is not achievable using any of the current state-of-the-art embedding methods. Finally, we demonstrate the efficacy of our framework by conducting an experiment with users searching for movie actors.
Researcher Affiliation	Collaboration	1School of Computer and Communication Sciences, EPFL, Lausanne, Switzerland. 2Spotify.
Pseudocode	Yes	Algorithm 1 SAMPLEMIRROR
Open Source Code	No	The source code for our methods as well as the data collected in the face search experiment will be available at https://indy.epfl.ch/.
Open Datasets	Yes	on two real world datasets: the red wine dataset (Cortez et al., 2009) and the music dataset (Zhou et al., 2014).
Dataset Splits	No	The paper describes experimental setups and data sampling for runs, but does not specify explicit training, validation, or test dataset splits in terms of percentages or counts, or refer to standard predefined splits for the datasets used.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments, only general mentions of computational efficiency.
Software Dependencies	No	The paper does not list specific software components with their version numbers that would be needed to replicate the experiments.
Experiment Setup	Yes	During the search, comparison outcomes are sampled from the probit model (1), using a value of σ2 ε chosen such that approximately 10% of the queries outcomes are flipped. In the case when the true d is not known, we estimated it by first conservatively setting D = 100, and then, after collecting around 10000 triplets, picking the smallest value of D with 98% of the energy in the eigenvalue decomposition of the covariance matrix of the mean vectors ˆ Xµ. This number for both datasets was 20, and D = 20 was used as the approximation of the actual d. For each search, we first sampled T {1000, 3000, 5000, 10k, 20k, 40k} triplets, uniformly from the source 40k+ triplets, and then generated the embedding as described in Section 5.