Active preference learning for ordering items in- and out-of-sample

Authors: Herman Bergström, Emil Carlsson, Devdatt Dubhashi, Fredrik D. Johansson

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate GURO (Algorithm 1) and GURO Hybrid (see Section 5.1) in four image ordering tasks, one with logistic (synthetic) preference feedback, and three tasks based on real-world feedback from human annotators. [...] Our results demonstrate superior sample efficiency and generalization compared to non-contextual ranking approaches and active preference learning baselines.
Researcher Affiliation Collaboration Herman Bergström Chalmers University of Technology and University of Gothenburg hermanb@chalmers.se Emil Carlsson Sleep Cycle AB Chalmers University of Technology and University of Gothenburg Devdatt Dubhashi Chalmers University of Technology and University of Gothenburg Fredrik D. Johansson Chalmers University of Technology and University of Gothenburg
Pseudocode Yes Algorithm 1 Greedy Uncertainty Reduction for Ordering (GURO), [Bayes GURO] [...] Algorithm 2 Uniform sampling algorithm [...] Algorithm 3 BALD bandit
Open Source Code Yes Our code is available at: https://github.com/Healthy-AI/GURO
Open Datasets Yes Image Clarity Data available at https://dbgroup.cs.tsinghua.edu.cn/ligl/crowdtopk. [...] Wisc Adds Data available at https://dataverse.harvard.edu/dataset.xhtml?persistent Id= doi:10.7910/DVN/0ZRGEE (license: CC0 1.0). [...] IMDB-WIKI-Sb S Data available at https://github.com/Toloka/IMDB-WIKI-Sb S (license: CC BY). [...] X-ray Age Prediction Challenge (Felipe Kitamura, 2023)
Dataset Splits Yes Next, we split these into two sets, with one (ID) containing the youngest 50% and the other (IE) the oldest 50%. [...] For every seed, 10% of comparisons were used for the holdout set.
Hardware Specification Yes The longest trajectory (single seed) for any algorithm took less than 35hrs to complete on one core of an Intel Xeon Gold 6130 CPU and required at most 10 GB of memory.
Software Dependencies No GURO, Co LSTIM, and Uniform use Logistic Regression from Scikit-learn (Pedregosa et al., 2011) with default Ridge regularization (C = 1) and the lbfgs optimizer.
Experiment Setup Yes For Bayes GURO and BALD, the posterior p(θ | Dt) is estimated using the Laplace approximation as described in Bishop and Nasrabadi (2006, Chapter 4). [...] For both methods, the priors θB,0 = 0d and H 1 B,0 = Id were used, and sequential updates were performed every iteration. [...] For Bayes GURO, 50 posterior samples were used to estimate ˆVθ|Dt[σ(θT zij)] for every zij. [...] GURO, Co LSTIM, and Uniform use Logistic Regression from Scikit-learn (Pedregosa et al., 2011) with default Ridge regularization (C = 1) and the lbfgs optimizer.