Online Posterior Sampling with a Diffusion Prior

Authors: Branislav Kveton, Boris Oreshkin, Youngsuk Park, Aniket Anand Deshmukh, Rui Song

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our last contribution is an empirical evaluation on contextual bandits. Our experiments show that a score-based method fails to do so (Section 6.2).
Researcher Affiliation Industry Branislav Kveton Adobe Research Boris N. Oreshkin Amazon Youngsuk Park AWS AI Labs Aniket Deshmukh AWS AI Labs Rui Song Amazon The work was done at AWS AI Labs.
Pseudocode Yes Algorithm 1 IRLS: Iteratively reweighted least squares. ... Algorithm 2 Laplace DPS: Laplace posterior sampling with a diffusion model prior. ... Algorithm 3 Contextual Thompson sampling. ... Algorithm 4 DPS of Chung et al. [12].
Open Source Code Yes We include code to reproduce the synthetic results in Figures 2 and 4.
Open Datasets Yes The problem is simulated using the Movie Lens 1M dataset [28], with one million ratings for 3 706 movies from 6 040 users. ... The next experiment is on the MNIST dataset [31].
Dataset Splits No The paper describes the generation of data samples for evaluation and how parameters are sampled, but it does not specify explicit training, validation, or test dataset splits with percentages or counts.
Hardware Specification No The paper does not provide specific hardware details such as CPU models, GPU models, or memory specifications used for running the experiments. The NeurIPS checklist explicitly states 'No' for 'Experiments Compute Resources' with the justification 'Our experiments are not large scale.'
Software Dependencies No The paper mentions software like 'SCIKIT-LEARN' for fitting Gaussian mixtures and 'Python' is implied by the context of machine learning, but it does not provide specific version numbers for any software dependencies, libraries, or frameworks used.
Experiment Setup Yes We learn this distribution from 10 000 samples from it. In Diff TS and DPS, we follow Appendix B. The number of stages is T = 100 and the diffusion factor is αt = 0.97. The regressor in Appendix B is a 2-layer neural network with Re LU activations. In Tuned TS, we fit the mean and covariance using maximum likelihood estimation. In Mix TS, we fit the Gaussian mixture using SCIKIT-LEARN. All algorithms are evaluated on θ sampled from the true prior. The regret is computed as defined in (9). All error bars are standard errors of the estimates. In the linear bandit, the mean reward of item j for user i is U i Vj. The reward noise is σ = 0.75, and we estimate it from data. In the logistic bandit, the mean reward is g(U i Vj), where g is a sigmoid. In the linear bandit, the mean reward for a digit with embedding x is x θ and the reward noise is σ = 1.