reproducibilityindex.ai

Dichotomy of Control: Separating What You Can Control from What You Cannot

Authors: Sherry Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted an empirical evaluation to ascertain the effectiveness of Do C. For this evaluation, we considered three settings: (1) a Bernoulli bandit problem with stochastic rewards, based on a canonical worst-case scenario for RCSL (Brandfonbrener et al., 2022); (2) the Frozen Lake domain from (Brockman et al., 2016), where the future VAE approach proves ineffective; and finally (3) a modified set of Open AI Gym (Brockman et al., 2016) environments where we introduced environment stochasticity. In these studies, we found that Do C exhibits a significant advantage over RCSL/DT, and outperforms future VAE when the analogous to one-step RL is insufficient.
Researcher Affiliation	Collaboration	Mengjiao Yang University of California, Berkeley Google Research, Brain Team sherryy@berkeley.edu Dale Schuurmans University of Alberta Google Research, Brain Team Pieter Abbeel University of California, Berkeley Ofir Nachum Google Research, Brain Team
Pseudocode	Yes	Algorithm 1 Inference with Dichotomy of Control
Open Source Code	Yes	1Code available at https://github.com/google-research/google-research/tree/ master/dichotomy_of_control.
Open Datasets	Yes	We conducted an empirical evaluation to ascertain the effectiveness of Do C. For this evaluation, we considered three settings: (1) a Bernoulli bandit problem with stochastic rewards, based on a canonical worst-case scenario for RCSL (Brandfonbrener et al., 2022); (2) the Frozen Lake domain from (Brockman et al., 2016), where the future VAE approach proves ineffective; and finally (3) a modified set of Open AI Gym (Brockman et al., 2016) environments where we introduced environment stochasticity. For the Ant Maze task, we use the Ant Maze dataset from D4RL (Fu et al., 2020)
Dataset Splits	No	The paper refers to an 'offline dataset' and details about data collection but does not specify exact train/validation/test split percentages, sample counts, or refer to predefined public splits for reproducibility.
Hardware Specification	Yes	All models are trained on NVIDIA GPU P100.
Software Dependencies	No	The paper lists hyperparameters and general model components but does not provide specific version numbers for software dependencies such as deep learning frameworks (e.g., TensorFlow, PyTorch) or Python libraries.
Experiment Setup	Yes	Table 1: Hyperparameters of Decision Transformer, future-conditioned VAE, and Dichotomy of Control. Hyperparameter Value Number of layers 3 Number of attention heads 1 Embedding dimension 128 Latent future dimension 128 Nonlinearity function Re LU Batch size 64 Context length K 20 Frozen Lake, Half Cheetah, Hopper, Humanoid, Ant Maze 5 Reacher Future length Kf Same as context length K Return-to-go conditioning for DT 1 Frozen Lake 6000 Half Cheetah 3600 Hopper 5000 Humanoid 50 Reacher 1 Ant Maze Dropout 0.1 Learning rate 10 4 Grad norm clip 0.25 Weight decay 10 4 Learning rate decay Linear warmup for first 105 training steps β coefficient 1.0 for Do C, Best of 0.1, 1.0, 10 for VAE