Thompson Sampling with Diffusion Generative Prior
Authors: Yu-Guan Hsieh, Shiva Kasiviswanathan, Branislav Kveton, Patrick Blöbaum
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, our extensive experiments clearly demonstrate the potential of the proposed approach. |
| Researcher Affiliation | Collaboration | Yu-Guan Hsieh* 1 Shiva Prasad Kasiviswanathan 2 Branislav Kveton 3 Patrick Blöbaum 2 1Université Grenoble Alpes 2Amazon 3AWS AI Labs. |
| Pseudocode | Yes | Algorithm 1 Diffusion Model Variance Calibration Algorithm 2 Posterior Sampling with Diffusion Prior Algorithm 3 Diff TS: Thompson Sampling with Diffusion Prior Algorithm 4 Diffusion Model Training from Imperfect (incomplete and noisy) Data Algorithm 5 Meta-learning Bandits with Diffusion Models Algorithm 6 Diffusion Model Variance Calibration from Imperfect (incomplete and noisy) Data |
| Open Source Code | No | The paper does not provide a direct link or explicit statement about releasing the source code for the methodology described. |
| Open Datasets | Yes | For the latter, we use the corresponding empirical distributions of 1352 ad slots from the i Pin You bidding data set (each ad slot is a single bandit task). To form the tasks, we further group the bids according to the associated ad slots. By keeping only those ad slots with at least 1000 bids, we obtain a data set of 1352 ad slots. Then, the empirical distribution of the paying price (i.e., the highest bid from competitors) of each ad slot is used to computed the success rate of every potential bid b {0, . . . , 299} set by the learner. The reward is either 300 b when the learner wins the auction or 0 otherwise. Finally, we divide everything by the largest reward that the learner can ever get in all the tasks to scale the rewards to range [0, 1]. |
| Dataset Splits | Yes | Training, calibration, and test set are constructed for each of the considered problem. Their size are fixed at 5000, 1000, 100 for the Popular and Niche, 2D Maze, and Labeled Arms Problems, and at 1200, 100, and 52 for the i Pin You Bidding problem. |
| Hardware Specification | Yes | All the simulations are run on an Amazon p3.2xlarge instance equipped with 8 NVIDIA Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions using "Adam optimizer" and specifying some hyperparameters (e.g., "learning rate 5e-4", "exponential decay rates beta1 = 0.9 and beta2 = 0.99"), but it does not specify software names with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x). |
| Experiment Setup | Yes | The batch size and the epsilon constant in SURE-based regularization are respectively fixed at 128 and ϵ = 10−5. When training and validation data are incomplete and noisy, we follow the training procedure described in Algorithm 4 with default values S = 15000 warm-up steps, J = 3 repeats, and S = 3000 steps within each repeat (thus 24000 steps in total). The learning rate and the batch size are respectively fixed at 10−4 and 128. For the regularization term, we take λ = 0.2 for MNIST and λ = 0.1 for Fashion-MNIST. |