Model-Free Preference Elicitation

Authors: Carlos Martin, Craig Boutilier, Ofer Meshi, Tuomas Sandholm

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that our approach offers significant improvement in recommendation quality over standard baselines on several PE tasks. [...] We describe our experimental setting and present our experimental results. [...] We empirically compare the performance of our model-free approach ( 6) to the model-based approach ( 5) in terms of the expected utility to the user of the recommended item at the end of an episode.
Researcher Affiliation Collaboration Carlos Martin2 , Craig Boutilier1 , Ofer Meshi1 and Tuomas Sandholm2,3,4,5 1Google Research 2Carnegie Mellon University 3Strategy Robot, Inc. 4Optimized Markets, Inc. 5Strategic Machine, Inc.
Pseudocode No The paper describes algorithms (e.g., the MCTS algorithm) in prose but does not include a figure, block, or section explicitly labeled "Pseudocode" or "Algorithm X".
Open Source Code No The paper mentions using an existing open-source implementation ("In our experiments, we use the open-source implementation found in Deep Mind s JAX library mctx [Babuschkin et al., 2020].4"), but does not provide concrete access to the source code for the novel methodology described in this paper.
Open Datasets Yes We use three datasets. Movie Lens 1M. The Movie Lens 1M dataset [Harper and Konstan, 2016] [...] Movie Lens 25M. The Movie Lens 25M dataset [Harper and Konstan, 2016] [...] Amazon Reviews. The 2018 Amazon review dataset [Ni et al., 2019]
Dataset Splits Yes We train using 100 epochs, a batch size of 32, 20 trials, and 10% of the dataset for validation.
Hardware Specification Yes We use 3000 episodes for each policy, and execute them on a single NVIDIA A100 SXM4 40GB GPU.
Software Dependencies No The paper mentions software like Radam optimizer, TensorFlow Probability, JAX, and mctx, but does not provide specific version numbers for these software components (e.g., 'TensorFlow Probability X.Y.Z', 'JAX A.B.C').
Experiment Setup Yes All of our models use a hidden layer of size 1024 where applicable, and four heads for the attention model. We use the Radam optimizer [Liu et al., 2019] (based on Adam [Kingma and Ba, 2014]) with a learning rate of 10 5 and weight decay 10 3. We train using 100 epochs, a batch size of 32, 20 trials, and 10% of the dataset for validation.