Deciding What to Learn: A Rate-Distortion Approach

Authors: Dilip Arumugam, Benjamin Van Roy

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We establish a general bound on expected discounted regret for an agent that decides what to learn in this manner along with computational experiments that illustrate the expressiveness of designer preferences and even show improvements over Thompson sampling in identifying an optimal policy.
Researcher Affiliation Academia 1Stanford University, California, USA. Correspondence to: Dilip Arumugam <dilip@cs.stanford.edu>.
Pseudocode Yes Algorithm 1 Blahut-Arimoto Satisficing Thompson Sampling (BLASTS)
Open Source Code No The paper does not provide an explicit statement or link for the open-sourcing of the code for the described methodology (BLASTS).
Open Datasets No The paper describes using independent Bernoulli and Gaussian bandit problems, with parameters generated (e.g., 'Ea ~ Uniform(0, 1)'). It does not refer to existing publicly available datasets with access information.
Dataset Splits No The paper does not specify explicit dataset splits (e.g., percentages or sample counts) for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies No The paper mentions 'Adam (Kingma & Ba, 2014)' and 'linear hypermodels (Dwaracherla et al., 2020)', and leverages 'an existing implementation of the Blahut Arimoto algorithm for all experiments (James et al., 2018)', but it does not specify version numbers for these or any other software dependencies.
Experiment Setup Yes We use a noise variance of 0.1, a prior variance of 1.0, and a batch size of 1024 throughout all experiments while using Adam (Kingma & Ba, 2014) to optimize hypermodel parameters with a learning rate of 0.001. ... The number of posterior samples used was fixed to 64 and the maximum number of iterations was set to 100, stopping early if the average distortion between two consecutive iterations fell below a small threshold.