reproducibilityindex.ai

Optimizing Sequential Experimental Design with Deep Reinforcement Learning

Authors: Tom Blau, Edwin V. Bonilla, Iadine Chades, Amir Dezfouli

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4. Experimental Results In this section we compare our approach to several baselines in order to empirically establish its theoretical beneﬁts. We will examine experimental design problems with both continuous and discrete design spaces, as well as the case where the likelihood model is not differentiable.
Researcher Affiliation	Academia	1CSIRO s Data61, Australia 2CSIRO s Land and Water, Australia. Correspondence to: Tom, Blau <tom.blau@data61.csiro.au>.
Pseudocode	No	The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	See https://github.com/csiro-mlai/RL-BOED for source code.
Open Datasets	No	The paper describes the experimental problems and references prior work (e.g., Foster et al., 2021; Foster et al., 2020; Moffat et al., 2020) where these problems were studied. However, it does not provide direct links, DOIs, or explicit citations to publicly available datasets used for these problems.
Dataset Splits	No	The paper does not explicitly specify exact dataset split percentages or sample counts for training, validation, and testing. It discusses problems but not how the data for these problems was partitioned into splits.
Hardware Specification	Yes	Except for SMC, all experiments were run on a Slurm cluster node with a single Nvidia Tesla P100 GPU and 4 cores of an Intel Xeon E5-2690 CPU.
Software Dependencies	No	The paper mentions software like Python, Pytorch, Pyro, Garage, REDQ, Soft Actor-Critic, and rpy2 but does not provide specific version numbers for these software components.
Experiment Setup	Yes	Hyparparameter choices for the different SED problems are enumerated in the table below. Values were chosen by linear search over each hyperparameter independently, with the following search spaces: target update rate τ {1e 3, 5e 3, 1e 2}; policy learning rate LRπ {1e 5, 1e 4, 3e 4, 1e 3}; q-function learning rate LRqf {1e 5, 1e 4, 3e 4, 1e 3}; replay buffer size \|buffer\| {1e5, 1e6, 1e7}; discount factor γ {0.95, 0.99, 1}.