reproducibilityindex.ai

Deciding What to Learn: A Rate-Distortion Approach

Authors: Dilip Arumugam, Benjamin Van Roy

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We establish a general bound on expected discounted regret for an agent that decides what to learn in this manner along with computational experiments that illustrate the expressiveness of designer preferences and even show improvements over Thompson sampling in identifying an optimal policy.
Researcher Affiliation	Academia	1Stanford University, California, USA. Correspondence to: Dilip Arumugam <dilip@cs.stanford.edu>.
Pseudocode	Yes	Algorithm 1 Blahut-Arimoto Satisﬁcing Thompson Sampling (BLASTS)
Open Source Code	No	The paper does not provide an explicit statement or link for the open-sourcing of the code for the described methodology (BLASTS).
Open Datasets	No	The paper describes using independent Bernoulli and Gaussian bandit problems, with parameters generated (e.g., 'Ea ~ Uniform(0, 1)'). It does not refer to existing publicly available datasets with access information.
Dataset Splits	No	The paper does not specify explicit dataset splits (e.g., percentages or sample counts) for training, validation, or testing.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies	No	The paper mentions 'Adam (Kingma & Ba, 2014)' and 'linear hypermodels (Dwaracherla et al., 2020)', and leverages 'an existing implementation of the Blahut Arimoto algorithm for all experiments (James et al., 2018)', but it does not specify version numbers for these or any other software dependencies.
Experiment Setup	Yes	We use a noise variance of 0.1, a prior variance of 1.0, and a batch size of 1024 throughout all experiments while using Adam (Kingma & Ba, 2014) to optimize hypermodel parameters with a learning rate of 0.001. ... The number of posterior samples used was fixed to 64 and the maximum number of iterations was set to 100, stopping early if the average distortion between two consecutive iterations fell below a small threshold.