DRIBO: Robust Deep Reinforcement Learning via Multi-View Information Bottleneck
Authors: Jiameng Fan, Wenchao Li
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that our approach can achieve SOTA performance on a diverse set of visual control tasks in the Deep Mind Control Suite when the background is replaced with natural videos. In addition, we show that our approach outperforms well-established baselines for generalization to unseen environments on the Procgen benchmark. |
| Researcher Affiliation | Academia | 1Department of Electrical and Computer Engineer, Boston University, Boston, Massachusetts, USA. |
| Pseudocode | Yes | Algorithm 1 DRIBO Loss... Algorithm 2 SAC + DRIBO Encoder... Algorithm 3 DRIBO + SAC... Algorithm 4 DRIBO + PPO... Algorithm 5 DRIBO Loss + KL Balancing |
| Open Source Code | Yes | Our code is opensourced and available at https://github. com/BU-DEPEND-Lab/DRIBO. |
| Open Datasets | Yes | We demonstrate that our approach can achieve SOTA performance on a diverse set of visual control tasks in the Deep Mind Control Suite when the background is replaced with natural videos... The backgrounds are replaced with natural videos from the Kinetics dataset (Kay et al., 2017)... For generalization, we present results on Procgen (Cobbe et al., 2020). |
| Dataset Splits | Yes | For the DMC suite, all agents are built on top of SAC. For the Procgen suite, we augment PPO, a RL baseline for Procgen, with DRIBO. Implementation details are in Appendix D... For each game, agents are trained on the first 200 levels, and evaluated w.r.t. their zero-shot performance averaged over unseen levels during testing. Unseen levels typically have different backgrounds or layouts... |
| Hardware Specification | Yes | We used a desktop with a 12-core CPU and a single GTX 1080 Ti GPU for benchmarking. |
| Software Dependencies | Yes | DMC benchmarks are simulated in Mu Jo Co 2.0. |
| Experiment Setup | Yes | We show other hyperparameters for DMC experiments in Table 2... We show other hyperparameters for Procgen environments in Table 5... The hyperparameter β in the DRIBO loss Algorithm 1 is slowly increased during training. β value starts from a small value 1e 4 and increases to 1e 3 with an exponential scheduler. |