reproducibilityindex.ai

Minimum Description Length Control

Authors: Ted Moskovitz, Ta-Chu Kao, Maneesh Sahani, Matthew Botvinick

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We motivate MDL-C via formal connections between the MDL principle and Bayesian inference, derive theoretical performance guarantees, and demonstrate MDL-C s empirical effectiveness on both discrete and highdimensional continuous control tasks.5 EXPERIMENTS We tested MDL-C applied to discrete and continuous control in both the sequential and parallel task settings. To quantify performance, in addition to measuring per-task reward, we also report the cumulative regret for each method in each experimental setting in Section I.1.
Researcher Affiliation	Collaboration	Ted Moskovitz1,, Calvin Kao2,3, Maneesh Sahani1, Matthew M. Botvinick1,2 1. Gatsby Unit, University College London 2. Deep Mind 3. Facebook Reality Labs Correspondence: ted@gatsby.ucl.ac.uk
Pseudocode	Yes	Algorithm 1: MDL-C for Sequential Multitask Learning with Persistent Replay (page 4), Algorithm 2: Idealized MDL-C for Multitask Learning (page 13), Algorithm 3: Off-Policy MDL-C for Parallel Multitask Learning (page 14).
Open Source Code	No	The paper does not contain an explicit statement that the source code for the described methodology is released, nor does it provide a link to a code repository.
Open Datasets	Yes	We first test MDL-C in the classic FOURROOMS environment (Fig. 5.1a, (Sutton et al., 1999)).We presented agents with multitask learning problems using environments from the Deep Mind Control Suite (DMC; (Tassa et al., 2018)).
Dataset Splits	No	The paper describes training phases and evaluation on specific tasks and environments (e.g., 'In the first phase of training, a single goal location is randomly sampled...', 'Test performance was computed by averaging performance across all K tasks'), but it does not specify explicit train/validation/test dataset splits with percentages or sample counts for reproduction, nor does it refer to predefined splits from cited benchmarks.
Hardware Specification	No	The paper does not provide specific details regarding the hardware used for running the experiments, such as GPU or CPU models, or cloud computing specifications.
Software Dependencies	No	The paper mentions software components such as 'advantage actor-critic (A2C)', 'soft actor critic (SAC)', and 'Adam' for optimization, but it does not specify their version numbers or the versions of any other key software dependencies required for reproducibility (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup	Yes	In all cases α = 0.1, β = 1.0, and learning rates for all agents were set to 0.0007. Agents were optimized with Adam (Kingma and Ba, 2014). (Section H.1) and Hyperparameters shared by all agents can be viewed in Table 2. (Section H.2) followed by Table 2 itself containing detailed hyperparameters like Network Hidden Layers 256:256, Learning Rate 3 10 4, Replay Buffer Size 1 106, etc.