Model-Free Preference Elicitation
Authors: Carlos Martin, Craig Boutilier, Ofer Meshi, Tuomas Sandholm
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that our approach offers significant improvement in recommendation quality over standard baselines on several PE tasks. [...] We describe our experimental setting and present our experimental results. [...] We empirically compare the performance of our model-free approach ( 6) to the model-based approach ( 5) in terms of the expected utility to the user of the recommended item at the end of an episode. |
| Researcher Affiliation | Collaboration | Carlos Martin2 , Craig Boutilier1 , Ofer Meshi1 and Tuomas Sandholm2,3,4,5 1Google Research 2Carnegie Mellon University 3Strategy Robot, Inc. 4Optimized Markets, Inc. 5Strategic Machine, Inc. |
| Pseudocode | No | The paper describes algorithms (e.g., the MCTS algorithm) in prose but does not include a figure, block, or section explicitly labeled "Pseudocode" or "Algorithm X". |
| Open Source Code | No | The paper mentions using an existing open-source implementation ("In our experiments, we use the open-source implementation found in Deep Mind s JAX library mctx [Babuschkin et al., 2020].4"), but does not provide concrete access to the source code for the novel methodology described in this paper. |
| Open Datasets | Yes | We use three datasets. Movie Lens 1M. The Movie Lens 1M dataset [Harper and Konstan, 2016] [...] Movie Lens 25M. The Movie Lens 25M dataset [Harper and Konstan, 2016] [...] Amazon Reviews. The 2018 Amazon review dataset [Ni et al., 2019] |
| Dataset Splits | Yes | We train using 100 epochs, a batch size of 32, 20 trials, and 10% of the dataset for validation. |
| Hardware Specification | Yes | We use 3000 episodes for each policy, and execute them on a single NVIDIA A100 SXM4 40GB GPU. |
| Software Dependencies | No | The paper mentions software like Radam optimizer, TensorFlow Probability, JAX, and mctx, but does not provide specific version numbers for these software components (e.g., 'TensorFlow Probability X.Y.Z', 'JAX A.B.C'). |
| Experiment Setup | Yes | All of our models use a hidden layer of size 1024 where applicable, and four heads for the attention model. We use the Radam optimizer [Liu et al., 2019] (based on Adam [Kingma and Ba, 2014]) with a learning rate of 10 5 and weight decay 10 3. We train using 100 epochs, a batch size of 32, 20 trials, and 10% of the dataset for validation. |