reproducibilityindex.ai

DRIBO: Robust Deep Reinforcement Learning via Multi-View Information Bottleneck

Authors: Jiameng Fan, Wenchao Li

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that our approach can achieve SOTA performance on a diverse set of visual control tasks in the Deep Mind Control Suite when the background is replaced with natural videos. In addition, we show that our approach outperforms well-established baselines for generalization to unseen environments on the Procgen benchmark.
Researcher Affiliation	Academia	1Department of Electrical and Computer Engineer, Boston University, Boston, Massachusetts, USA.
Pseudocode	Yes	Algorithm 1 DRIBO Loss... Algorithm 2 SAC + DRIBO Encoder... Algorithm 3 DRIBO + SAC... Algorithm 4 DRIBO + PPO... Algorithm 5 DRIBO Loss + KL Balancing
Open Source Code	Yes	Our code is opensourced and available at https://github. com/BU-DEPEND-Lab/DRIBO.
Open Datasets	Yes	We demonstrate that our approach can achieve SOTA performance on a diverse set of visual control tasks in the Deep Mind Control Suite when the background is replaced with natural videos... The backgrounds are replaced with natural videos from the Kinetics dataset (Kay et al., 2017)... For generalization, we present results on Procgen (Cobbe et al., 2020).
Dataset Splits	Yes	For the DMC suite, all agents are built on top of SAC. For the Procgen suite, we augment PPO, a RL baseline for Procgen, with DRIBO. Implementation details are in Appendix D... For each game, agents are trained on the first 200 levels, and evaluated w.r.t. their zero-shot performance averaged over unseen levels during testing. Unseen levels typically have different backgrounds or layouts...
Hardware Specification	Yes	We used a desktop with a 12-core CPU and a single GTX 1080 Ti GPU for benchmarking.
Software Dependencies	Yes	DMC benchmarks are simulated in Mu Jo Co 2.0.
Experiment Setup	Yes	We show other hyperparameters for DMC experiments in Table 2... We show other hyperparameters for Procgen environments in Table 5... The hyperparameter β in the DRIBO loss Algorithm 1 is slowly increased during training. β value starts from a small value 1e 4 and increases to 1e 3 with an exponential scheduler.