Dichotomy of Control: Separating What You Can Control from What You Cannot
Authors: Sherry Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted an empirical evaluation to ascertain the effectiveness of Do C. For this evaluation, we considered three settings: (1) a Bernoulli bandit problem with stochastic rewards, based on a canonical worst-case scenario for RCSL (Brandfonbrener et al., 2022); (2) the Frozen Lake domain from (Brockman et al., 2016), where the future VAE approach proves ineffective; and finally (3) a modified set of Open AI Gym (Brockman et al., 2016) environments where we introduced environment stochasticity. In these studies, we found that Do C exhibits a significant advantage over RCSL/DT, and outperforms future VAE when the analogous to one-step RL is insufficient. |
| Researcher Affiliation | Collaboration | Mengjiao Yang University of California, Berkeley Google Research, Brain Team sherryy@berkeley.edu Dale Schuurmans University of Alberta Google Research, Brain Team Pieter Abbeel University of California, Berkeley Ofir Nachum Google Research, Brain Team |
| Pseudocode | Yes | Algorithm 1 Inference with Dichotomy of Control |
| Open Source Code | Yes | 1Code available at https://github.com/google-research/google-research/tree/ master/dichotomy_of_control. |
| Open Datasets | Yes | We conducted an empirical evaluation to ascertain the effectiveness of Do C. For this evaluation, we considered three settings: (1) a Bernoulli bandit problem with stochastic rewards, based on a canonical worst-case scenario for RCSL (Brandfonbrener et al., 2022); (2) the Frozen Lake domain from (Brockman et al., 2016), where the future VAE approach proves ineffective; and finally (3) a modified set of Open AI Gym (Brockman et al., 2016) environments where we introduced environment stochasticity. For the Ant Maze task, we use the Ant Maze dataset from D4RL (Fu et al., 2020) |
| Dataset Splits | No | The paper refers to an 'offline dataset' and details about data collection but does not specify exact train/validation/test split percentages, sample counts, or refer to predefined public splits for reproducibility. |
| Hardware Specification | Yes | All models are trained on NVIDIA GPU P100. |
| Software Dependencies | No | The paper lists hyperparameters and general model components but does not provide specific version numbers for software dependencies such as deep learning frameworks (e.g., TensorFlow, PyTorch) or Python libraries. |
| Experiment Setup | Yes | Table 1: Hyperparameters of Decision Transformer, future-conditioned VAE, and Dichotomy of Control. Hyperparameter Value Number of layers 3 Number of attention heads 1 Embedding dimension 128 Latent future dimension 128 Nonlinearity function Re LU Batch size 64 Context length K 20 Frozen Lake, Half Cheetah, Hopper, Humanoid, Ant Maze 5 Reacher Future length Kf Same as context length K Return-to-go conditioning for DT 1 Frozen Lake 6000 Half Cheetah 3600 Hopper 5000 Humanoid 50 Reacher 1 Ant Maze Dropout 0.1 Learning rate 10 4 Grad norm clip 0.25 Weight decay 10 4 Learning rate decay Linear warmup for first 105 training steps β coefficient 1.0 for Do C, Best of 0.1, 1.0, 10 for VAE |