Deciding What to Learn: A Rate-Distortion Approach
Authors: Dilip Arumugam, Benjamin Van Roy
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We establish a general bound on expected discounted regret for an agent that decides what to learn in this manner along with computational experiments that illustrate the expressiveness of designer preferences and even show improvements over Thompson sampling in identifying an optimal policy. |
| Researcher Affiliation | Academia | 1Stanford University, California, USA. Correspondence to: Dilip Arumugam <dilip@cs.stanford.edu>. |
| Pseudocode | Yes | Algorithm 1 Blahut-Arimoto Satisficing Thompson Sampling (BLASTS) |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-sourcing of the code for the described methodology (BLASTS). |
| Open Datasets | No | The paper describes using independent Bernoulli and Gaussian bandit problems, with parameters generated (e.g., 'Ea ~ Uniform(0, 1)'). It does not refer to existing publicly available datasets with access information. |
| Dataset Splits | No | The paper does not specify explicit dataset splits (e.g., percentages or sample counts) for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Adam (Kingma & Ba, 2014)' and 'linear hypermodels (Dwaracherla et al., 2020)', and leverages 'an existing implementation of the Blahut Arimoto algorithm for all experiments (James et al., 2018)', but it does not specify version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We use a noise variance of 0.1, a prior variance of 1.0, and a batch size of 1024 throughout all experiments while using Adam (Kingma & Ba, 2014) to optimize hypermodel parameters with a learning rate of 0.001. ... The number of posterior samples used was fixed to 64 and the maximum number of iterations was set to 100, stopping early if the average distortion between two consecutive iterations fell below a small threshold. |