Minimum Description Length Control

Authors: Ted Moskovitz, Ta-Chu Kao, Maneesh Sahani, Matthew Botvinick

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We motivate MDL-C via formal connections between the MDL principle and Bayesian inference, derive theoretical performance guarantees, and demonstrate MDL-C s empirical effectiveness on both discrete and highdimensional continuous control tasks.5 EXPERIMENTS We tested MDL-C applied to discrete and continuous control in both the sequential and parallel task settings. To quantify performance, in addition to measuring per-task reward, we also report the cumulative regret for each method in each experimental setting in Section I.1.
Researcher Affiliation Collaboration Ted Moskovitz1,*, Calvin Kao2,3, Maneesh Sahani1, Matthew M. Botvinick1,2 1. Gatsby Unit, University College London 2. Deep Mind 3. Facebook Reality Labs *Correspondence: ted@gatsby.ucl.ac.uk
Pseudocode Yes Algorithm 1: MDL-C for Sequential Multitask Learning with Persistent Replay (page 4), Algorithm 2: Idealized MDL-C for Multitask Learning (page 13), Algorithm 3: Off-Policy MDL-C for Parallel Multitask Learning (page 14).
Open Source Code No The paper does not contain an explicit statement that the source code for the described methodology is released, nor does it provide a link to a code repository.
Open Datasets Yes We first test MDL-C in the classic FOURROOMS environment (Fig. 5.1a, (Sutton et al., 1999)).We presented agents with multitask learning problems using environments from the Deep Mind Control Suite (DMC; (Tassa et al., 2018)).
Dataset Splits No The paper describes training phases and evaluation on specific tasks and environments (e.g., 'In the first phase of training, a single goal location is randomly sampled...', 'Test performance was computed by averaging performance across all K tasks'), but it does not specify explicit train/validation/test dataset splits with percentages or sample counts for reproduction, nor does it refer to predefined splits from cited benchmarks.
Hardware Specification No The paper does not provide specific details regarding the hardware used for running the experiments, such as GPU or CPU models, or cloud computing specifications.
Software Dependencies No The paper mentions software components such as 'advantage actor-critic (A2C)', 'soft actor critic (SAC)', and 'Adam' for optimization, but it does not specify their version numbers or the versions of any other key software dependencies required for reproducibility (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup Yes In all cases α = 0.1, β = 1.0, and learning rates for all agents were set to 0.0007. Agents were optimized with Adam (Kingma and Ba, 2014). (Section H.1) and Hyperparameters shared by all agents can be viewed in Table 2. (Section H.2) followed by Table 2 itself containing detailed hyperparameters like Network Hidden Layers 256:256, Learning Rate 3 10 4, Replay Buffer Size 1 106, etc.