DRIBO: Robust Deep Reinforcement Learning via Multi-View Information Bottleneck

Authors: Jiameng Fan, Wenchao Li

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that our approach can achieve SOTA performance on a diverse set of visual control tasks in the Deep Mind Control Suite when the background is replaced with natural videos. In addition, we show that our approach outperforms well-established baselines for generalization to unseen environments on the Procgen benchmark.
Researcher Affiliation Academia 1Department of Electrical and Computer Engineer, Boston University, Boston, Massachusetts, USA.
Pseudocode Yes Algorithm 1 DRIBO Loss... Algorithm 2 SAC + DRIBO Encoder... Algorithm 3 DRIBO + SAC... Algorithm 4 DRIBO + PPO... Algorithm 5 DRIBO Loss + KL Balancing
Open Source Code Yes Our code is opensourced and available at https://github. com/BU-DEPEND-Lab/DRIBO.
Open Datasets Yes We demonstrate that our approach can achieve SOTA performance on a diverse set of visual control tasks in the Deep Mind Control Suite when the background is replaced with natural videos... The backgrounds are replaced with natural videos from the Kinetics dataset (Kay et al., 2017)... For generalization, we present results on Procgen (Cobbe et al., 2020).
Dataset Splits Yes For the DMC suite, all agents are built on top of SAC. For the Procgen suite, we augment PPO, a RL baseline for Procgen, with DRIBO. Implementation details are in Appendix D... For each game, agents are trained on the first 200 levels, and evaluated w.r.t. their zero-shot performance averaged over unseen levels during testing. Unseen levels typically have different backgrounds or layouts...
Hardware Specification Yes We used a desktop with a 12-core CPU and a single GTX 1080 Ti GPU for benchmarking.
Software Dependencies Yes DMC benchmarks are simulated in Mu Jo Co 2.0.
Experiment Setup Yes We show other hyperparameters for DMC experiments in Table 2... We show other hyperparameters for Procgen environments in Table 5... The hyperparameter β in the DRIBO loss Algorithm 1 is slowly increased during training. β value starts from a small value 1e 4 and increases to 1e 3 with an exponential scheduler.