Optimizing Sequential Experimental Design with Deep Reinforcement Learning
Authors: Tom Blau, Edwin V. Bonilla, Iadine Chades, Amir Dezfouli
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4. Experimental Results In this section we compare our approach to several baselines in order to empirically establish its theoretical benefits. We will examine experimental design problems with both continuous and discrete design spaces, as well as the case where the likelihood model is not differentiable. |
| Researcher Affiliation | Academia | 1CSIRO s Data61, Australia 2CSIRO s Land and Water, Australia. Correspondence to: Tom, Blau <tom.blau@data61.csiro.au>. |
| Pseudocode | No | The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | See https://github.com/csiro-mlai/RL-BOED for source code. |
| Open Datasets | No | The paper describes the experimental problems and references prior work (e.g., Foster et al., 2021; Foster et al., 2020; Moffat et al., 2020) where these problems were studied. However, it does not provide direct links, DOIs, or explicit citations to publicly available datasets used for these problems. |
| Dataset Splits | No | The paper does not explicitly specify exact dataset split percentages or sample counts for training, validation, and testing. It discusses problems but not how the data for these problems was partitioned into splits. |
| Hardware Specification | Yes | Except for SMC, all experiments were run on a Slurm cluster node with a single Nvidia Tesla P100 GPU and 4 cores of an Intel Xeon E5-2690 CPU. |
| Software Dependencies | No | The paper mentions software like Python, Pytorch, Pyro, Garage, REDQ, Soft Actor-Critic, and rpy2 but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | Hyparparameter choices for the different SED problems are enumerated in the table below. Values were chosen by linear search over each hyperparameter independently, with the following search spaces: target update rate τ {1e 3, 5e 3, 1e 2}; policy learning rate LRπ {1e 5, 1e 4, 3e 4, 1e 3}; q-function learning rate LRqf {1e 5, 1e 4, 3e 4, 1e 3}; replay buffer size |buffer| {1e5, 1e6, 1e7}; discount factor γ {0.95, 0.99, 1}. |